การหาเส นทางในเคร อข ายเซ นเซอร ไร สายเคล อนท สาหร บช วการแพทย ด วย ร อ นฟอร สเมนท เล ร นน ง โดยใช ทร สท และเร บพ วเทช น นางสาวญาน นะพ ทธะ

Size: px
Start display at page:

Download "การหาเส นทางในเคร อข ายเซ นเซอร ไร สายเคล อนท สาหร บช วการแพทย ด วย ร อ นฟอร สเมนท เล ร นน ง โดยใช ทร สท และเร บพ วเทช น นางสาวญาน นะพ ทธะ"

Transcription

1 การหาเส นทางในเคร อข ายเซ นเซอร ไร สายเคล อนท สาหร บช วการแพทย ด วย ร อ นฟอร สเมนท เล ร นน ง โดยใช ทร สท และเร บพ วเทช น นางสาวญาน นะพ ทธะ ว ทยาน พนธ น เป นส วนหน งของการศ กษาตามหล กส ตรปร ญญาว ศวกรรมศาสตรมหาบ ณฑ ต สาขาว ชาว ศวกรรมโทรคมนาคม มหาว ทยาล ยเทคโนโลย ส รนาร ป การศ กษา 2555

2 RL-BASED ROUTING IN BIOMEDICAL MOBILE WIRELESS SENSOR NETWORKS USING TRUST AND REPUTATION Yanee Naputta A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Telecommunication Engineering Suranaree University of Technology Academic Year 2012

3 RL-BASED ROUTING IN BIOMEDICAL MOBILE WIRELESS SENSOR NETWORKS USING TRUST AND REPUTATION Suranaree University of Technology has approved this thesis submitted in partial fulfillment of the requirements for a Master s Degree. Thesis Examining Committee (Asst. Prof. Dr. Peerapong Uthansakul) Chairperson (Asst. Prof. Dr. Wipawee Hattagam) Member (Thesis Advisor) (Asst. Prof. Dr. Paramate Horkaew) Member (Prof. Dr. Sukit Limpijumnong) Vice Rector for Academic Affairs (Assoc. Prof. Flt. Lt. Dr. Kontorn Chamniprasart) Dean of Institute of Engineering

4 ญาน นะพ ทธะ : การหาเส นทางในเคร อข ายเซ นเซอร ไร สายเคล อนท สาหร บช วการแพทย ด วยร อ นฟอร สเมนท เล ร นน ง โดยใช ทร สท และเร บพ วเทช น (RL-BASED ROUTING IN BIOMEDICAL MOBILE WIRELESS SENSOR NETWORKS USING TRUST AND REPUTATION) อาจารย ท ปร กษา : ผ ช วยศาสตราจารย ดร.ว ภาว ห ตถกรรม, 68 หน า. เคร อข ายเซ นเซอร ทางด านช วการแพทย ได กลายเป นกระบวนการท ม ศ กยภาพในการเฝ า ระว งด านส ขภาพของคนได ท งท บ านและท โรงพยาบาล การประย กต ใช เซ นเซอร ทางด านช ว - การแพทย น เหมาะสมอย างย งส าหร บผ ส งอาย และผ ท พลภาพท ต องการเคล อนไปไหนมาไหน มากกว าถ กจ าก ดให อย ในสถานท เฉพาะ เคร อข ายด งกล าวจะช วยให การเฝ าระว งส ขภาพในด าน ข อม ลทางสร รว ทยาของผ ป วยเป นไปได อย างต อเน อง โดยเซ นเซอร จะถ กต ดอย ก บต วของผ ป วย และส งข อม ลเหล าน นกล บไปย งศ นย การแพทย และเพ อสน บสน นการประย กต ใช งานทางด านช ว การแพทย น พาราม เตอร ทางด านประส ทธ ภาพของเคร อข าย เช น อ ตราความส าเร จในการส งแพ ค เก ต เวลาในการส งข อม ลจากต นทางไปถ งปลายทาง จะต องเป นไปตามความต องการได เพ อให แน ใจว าแพ กเก ตข อม ลสามารถถ กส งออกไปย งศ นย การแพทย อย างไรก ตาม ในสถานการณ ท สมจร งมากข น บางโหนดไม ยอมให ความร วมม อก บโหนดอ น เช น ไม ยอมส งต อแพ กเก ตท ได ร บมา อาจเป นเพราะแบตเตอร หมด โหนดช าร ดหร อท างานผ ดปกต โดยไม ทราบสาเหต ซ งจะท าให ประส ทธ ภาพของเคร อข ายลดลง ด งน น ว ตถ ประสงค ของงานว จ ยน จ งน าเสนอการปร บปร งว ธ การหาเส นทางในเคร อข าย เซ นเซอร ไร สายเคล อนท ทางด านช วการแพทย โดยใช การบ รณาการของอ ลกอร ธ มเร ยนร แบบร อ น ฟอร สเมนท (reinforcement learning; RL) เข าก บกระบวนการของทร สท และเร บพ วเทช น เร ยกว า ค วอาร ท และท าการเปร ยบเท ยบก บว ธ การเด มท ม อย แล วซ งเร ยกว าอ ลกอร ธ มอาร แอล-ค วอาร พ (reinforcement learning based routing protocol; RL-QRP) และอ ลกอร ธ มท ไม ม การเร ยนร เร ยกว า เทรสโฮลด อ ลกอร ธ ม การจ าลองสถานการณ ต างๆถ กทดลองภายใต เง อนไขของการเคล อนท ของ โหนด การไม ร วมม อของโหนด และเง อนไขของเวลาในการส งแพ กเก ตจากต นทางไปปลายทางท ต องการ งานว จ ยช นน ได ศ กษามาตรช ว ดประส ทธ ภาพของการหาเส นทางสามอย าง ค อ ค าเฉล ย อ ตราความส าเร จในการส งข อม ล (average success ratio) ค าเฉล ยของเวลาในการส งแพ กเก ตจาก ต อนทางไปปลายทาง (average end-to-end delay) และจ านวนของเส นทางท พบในแต ละความยาว ของเส นทาง (number of discovered path for each path length) ผลการทดลองแสดงให เห นว า ค วอาร ท อ ลกอร ธ มท น าเสนอสามารถให ประส ทธ ภาพส ง กว าอ ลกอร ธ มอาร แอลค วอาร พ ท ม อย แล วและเทรสโฮลด อ ลกอร ธ มในเทอมของค าเฉล ยอ ตรา

5 ความส าเร จในการส งข อม ลภายใต เง อนไขของโหนดท ไม ให ความร วมม อ ส งถ ง 11% และ 25% ตามลาด บ ภายใต เง อนไขของโหนดท ม การเคล อนท ส งถ ง 9% และ 22% ตามล าด บ ย งไปกว าน น ในกรณ ของเง อนไขเวลาในการส งแพ กเก ตจากต นทางไปปลายทางท ต องการ ค วอาร ท อ ลกอร ธ มม ค าเฉล ยอ ตราความส าเร จในการส งของม ลมากกว าอาร แอล-ค วอาร พ อ ลกอร ธ มถ ง 11% ซ งจากผล การทดลองในการทดลองของเราช ให เห นว าว ธ การทร สท และเร บพ วเทช นสามารถน ามา ประย กต ใช เพ อปร บปร งการหาเส นทางในเคร อข ายเซ นเซอร ไร สายเคล อนท ท ม โหนดซ งไม ให ความร วมม ออย ในเคร อข ายให ม ประส ทธ ภาพมากข นภายใต การประย กต ใช เวลาในการส งข อม ล จากต นทางไปย งปลายทางท จาก ด II สาขาว ชาว ศวกรรมโทรคมนาคม ลายม อช อน กศ กษา ป การศ กษา 2555 ลายม อช ออาจารย ท ปร กษา

6 YANEE NAPUTTA : RL-BASED ROUTING IN BIOMEDICAL MOBILE WIRELESS SENSOR NETWORKS USING TRUST AND REPUTATION. THESIS ADVISOR : ASST. PROF. WIPAWEE HATTAGAM, Ph.D., 68 PP. MOBILE WIRELESS SENSOR NETWORKS/ REINFORCEMENT LEARNING/ TRUST AND REPUTATION/ ROUTING/NON-COOPERATIVE Biomedical Sensor Networks have become a potential solution for monitoring health of people in their home and at hospital. Their application is especially suitable for elderly and disabled people who may prefer to be on-the-move, rather than constrained in a particular area. Such networks allow continuous monitoring of the patient s physiological information. Sensors are attached to the body and relayed back to the medical center. To support such application, network performance metrics such as packet delivery ratio, end-to-end delay must be satisfied to ensure that data packets can be routed and reliably delivered to the medical center. However, in a more realistic scenario some nodes do not cooperate with each other (i.e. by dropping packets they receive) either due to node battery depletion, malfunctioning or simply misbehaving for unknown reason thereby degrading network performance. The underlying aim of this research is therefore to propose an enhancement to a RL-based routing in biomedical mobile wireless sensor networks by integrating it with trust and reputation, called QRT, and compare it to an existing scheme which has been used to find optimal path through experience and reward for biomedical sensor network, called reinforcement learning based routing protocol (RL-QRP) algorithm and a non-learning algorithm called the threshold. Simulations were conducted under

7 IV different mobility, malicious nodes and end-to-end delay requirement conditions. The routing performance metrics studied in this research were of average success ratio, average end-to-end delay and the number of discovered path for each path length. The experiments results showed that proposed QRT algorithm can outperform existing RL-QRP algorithms and the threshold scheme in terms of average success ratio by up to 11% and 25%, respectively in the malicious node variation case, and up to 9% and 22%, respectively in the node mobility variation case. Furthermore, in the end-to-end delay requirement case, QRT gained 11% up to over RL-QRP algorithm. The results in our experiments suggest that trust and reputation can be applied to improve routing in presence of malicious nodes in mwsns with stringent end-to-end delay requirements applications. School of Telecommunication Engineering Academic Year 2012 Student s Signature Advisor s Signature

8 ACKNOWLEDGEMENT I am grateful to all those, who by their direct or indirect involvement have helped in the completion of this thesis. First and foremost, I would like to express my sincere thanks to my thesis advisor, Asst. Prof. Dr. Wipawee Hattagam for her invaluable help and constant encouragement throughout the course of this research. I am most grateful for her teaching and advice, not only the research methodologies but also many other methodologies in life. I would not have achieved this far and this thesis would not have been completed without all the support that I have always received from her. In addition, I am grateful for the lecturers in School of Telecommunication Engineering for their suggestion and all their help. I would also like to express my thanks to Dr. Kae Hsiang Kwong, a senior research fellow of University of Strathclyde, Scotland, for granting me the opportunity to do research in Scotland. I would also like to thank Asst. Prof. Dr. Peerapong Uthansakul and Asst. Prof. Dr. Paramate Horkaew for accepting to serve in my committee. My sincere gratitude goes to the Telecommunication Research Industrial and Development Institute (TRIDI), National Telecommunication Commission Fund, Thailand for the scholarship throughout my studies and for the fruitful discussions and insights received from all the progress update meetings. My sincere appreciation goes to Ms. Pranitta Arthans for her valuable administrative support during the course of my dissertation.

9 VI Finally, I am most grateful to my parents and my friends both in both masters and doctoral degree courses for all their support throughout the period of this research. Yanee Naputta

10 VII TABLE OF CONTENTS Page ABSTRACT (THAI) ABSTRACT (ENGLISH) ACKNOWLEDGEMENTS TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES SYMBOLS AND ABBREVIATIONS I III V VII XI XII XIV CHAPTER I INTRODUCTION Significance of the Problem Research Objectives Research Hypothesis Basic Agreements Scope and Limitation Research Methodology Progressions Research Methodology Research Location Research Equipments 8

11 VIII TABLE OF CONTENTS (Continued) Page Data Collection Data Analysis Expected Benefit Organization of Thesis 9 II BACKGROUND THEORY Introduction Markov Decision Process Theory Markov Property Markov Decision Process Policy Reinforcement Learning The Value Function The Optimal Value Function Q-learning Exploration Trust and Reputation Representation and Update: Binary Ratings Reputation and Update: Interval Rating Trust Summary 27

12 IX TABLE OF CONTENTS (Continued) Page III RL-based Routing in Biomedical Mobile Wireless Sensor Networks using Trust and Reputation Introduction Reinforcement Learning based Routing Protocol with QoS Support for Biomedical Sensor Networks (RL-QRP) Reputation RL-QRP with Trust and Reputation Performance Evaluation Unconstrained Traffic Demand Part 1 Malicious Nodes Effect Part 2 Mobility Effect Traffic Demand with End-to-End Delay QoS Conclusion 50 IV CONCLUSION AND FUTURE WORK Conclusion QRT Quality-of-Service Future Work mwsns with Indirect Reputation Value Traffic Priority Performance Evaluation of Test Bed 55

13 X TABLE OF CONTENTS (Continued) Page mwsns with Energy Consumption Condition 55 REFERENCES 56 APPENDIX A PUBLICATION 61 BIOGRAPHY 68

14 LIST OF TABLES Table Page 3.1 QRT Routing Algorithm Simulation Parameters Simulation Parameters 44

15 LIST OF FIGURES Figure Page 2.1 A MDP model Diagram of agent-environment interaction in reinforcement learning RL-QRP routing model Average success ratio of discovered paths Average end-to-end delay of discovered paths Number of discovered paths length for 9 malicious nodes Average success ratio under various degrees of mobility Average end-to-end delay of discovered paths under various degrees of mobility Average number of discovered path length under various degrees of mobility Average success ratio under different end-to-end delay requirements and probability of malicious node = Average success ratio under different end-to-end delay requirements and probability of malicious node = Average end-to-end delay under different end-to-end delay requirements and probability of malicious node = Average end-to-end delay under different end-to-end delay requirements and probability of malicious node =

16 XIII LIST OF FIGURES (Continued) Figure Page 3.12 Number of discovered path under different end-to-end delay requirements = 100 msec and probability of malicious node = Number of discovered path under different end-to-end delay requirements = 100 msec and probability of malicious node = Number of discovered path under different end-to-end delay requirements = 200 msec and probability of malicious node = Number of discovered path under different end-to-end delay requirements = 200 msec and probability of malicious node =

17 SYMBOLS AND ABBREVIATIONS WSNs = Wireless sensor networks ECG = Electrocardiogram mwsn = Mobile wireless sensor network MAC = Media access control GPS = Global positioning system RL = Reinforcement learning QoS = Quality-of-service RFSN = Reputation based framework for sensor network RL-QRP = A reinforcement learning based routing protocol with QoS support for biomedical sensor networks MDP = Markov decision process C = Criticality of the routing device MDP = Markov decision process t = Time step index = Learning rate S t = State of the process at time t S = State space s = Current state s = Next state A = Action space a = Action

18 XV SYMBOLS AND ABBREVIATIONS (Continued) E[] = Expectation operator = Discount factor R( s, a, s) = Expected reward given any current state s and an action a with any next state s ' r = Reward = Policy * = Optimal policy P[A] = Distribution over the action space Q ( s, a) = The action-value function of a given policy associates to a t state-action pair ( sa, ) at time t R t = Expected discounted return of the agent at time t E [] = Expectation operator under policy V () s = Value function of a state (s) under policy V * () s = Value function of a state (s) under optimal policy Q *( s, a) = * The action-value function of a given optimal policy associates to state-action pair ( sa, ) i = Class of message θ = Reputation value p(θ) = Prior distribution Г(.) = Gamma function

19 XVI SYMBOLS AND ABBREVIATIONS (Continued) D(δ) = Dirichlet process δ = Base measure T ij = Trust metric R ij = Reputation metric ( ) = Quality of action a at state s γ = Discount factor ( ) = The expectation future reward at state s by taking action a Ds i,s sink = The distance between node s i and destination node Ds j,s sink = The distance between node s j and destination node Ds i,s j = The distance between node s i and node s j T Q = The end-to-end delay requirement Tdelay s i,s j = The experience delay between node s i and s j N = The number of sensor nodes p = The number of success event n = The number of failures event l ij = Level of trust at node s j experienced by s i r = Reward function

20 CHAPTER I INTRODUCTION This chapter introduces a background on routing problems in biomedical mobile wireless sensor networks and highlights the significance of improving routing performance in such networks. It also presents the motivation for applying trust and reputation with reinforcement learning to provide a good routing solution which is the main focus of this thesis. 1.1 Significance of the Problem A wireless sensor networks (WSN) is a network of small devices, called sensor nodes that are embedded in the real world to collect measurements of interest, e.g., humidity in the air, soil moisture, temperature of environment, ph, etc. There are numerous applications for wireless sensor networks, e.g., battlefield surveillance, medical care, wildlife monitoring and disaster response. In this research, we are interested in biomedical wireless sensor networks which measure vital sign parameters such as body temperature, blood pressure, electrocardiogram (ECG), pulse oximeters and heart rate, etc. These parameters are sensed at a patient and transmitted to a base station at a medical center. The data is used for health status monitoring, diagnosis, treatment and further analysis. For example, Varshney, (2008) and Jovanov, (2009) proposed the use of wireless sensors to monitor vital signs of patients in a hospital environment.

21 2 In medical sensor networks used for monitoring disabled/elderly patients, sensor nodes are attached to a patient s body for physiological information. In case of emergency, patients may be moved to an emergency room, or disabled/elderly patients may be on the move in the hospital, medical staff may want to know their information continuously. Therefore, a mobile wireless sensor network system (mwsn) is necessary for biomedical sensor networks. Ref. Ying Hong Wang, (2008) and Nguyen, Defago, Beuran and Shinoda (2008) conducted some initial study on the overall network lifetime in mwsns. Mobility can further aggravate delay problems as currents paths become disconnected, new paths must be found for replacement. Most of the fundamental characteristics of mobile wireless sensor networks are the same as that of normal static WSNs. Some major differences, however, are as follows. 1) Due to the mobility, mobile WSNs have a much more dynamic topology compared to static WSNs. It is often assumed that a sink will move continuously in a random fashion, thus making the whole network dynamic. 2) It can be reasonably assumed that a gateway sink has an unlimited energy computation and storage resources. The depleted batteries of mobile sinks can be recharged or changed with fresh ones and mobile sinks have access to computational and storage devices. 3) The increased mobility in the case of mobile WSN imposes some restrictions on the already proposed routing and MAC level protocols for WSNs (Zhou, Xing, and Yu, 2006). Most of the protocols in static WSNs perform poorly in the case of mwsns.

22 3 4) Due to the dynamic topology of mwsns, communication links can often become unreliable. This is can be aggravated even further in hostile or remote areas where availability of constant communication channels is low. 5) Because of the mobility, location estimation plays an important role to maintain accurate knowledge of the location of the sinks or nodes. The location of the sinks or nodes can be obtained from GPS (Kim and Hong 2009 ; Yadav, Mishra, and Gore 2009 ; Kim, Lee, Yoon and Han 2009) From the aforementioned works, the design of mobile routing is a significant and challenging field. Nowadays, there are, however, few research in routing in mwsns. A routing technique which suitable for mwsns. Xuedong, Balasingham, and Byun, (2008) applies reinforcement learning which is a distributive, self-adaptive, lightweight mechanism to determine paths in a hop-by-hop manner. Reinforcement learning (RL) is a technique used to support routing in dynamic topology networks. RL is a study of how animals and artificial systems can learn to optimize their behavior by using its experience through rewards and punishments. RL algorithms have been developed to approximate solutions to sequential optimal control problems. In the standard reinforcement learning model, an agent is connected to its environment via state perception and action (Kaelbling, Littman, and Moore, 1996). There are some works which applied RL to solve routing problem in static WSNs (Karaki, and Kamal, 2004; Aghaei, Rahman, and Saddik, 2007; Forster and Murphy, 2007; 2008; wang, 2006; Dong, Agrawal, and Sivalingam, 2007). Apart from routing, some researches Seah, Tham, Srinivasan, and Xin, (2007) and Renaud, and Tham, (2006) used RL to solve coverage problems in static WSNs. Xuedong, Balasingham, and Byun, (2008) proposed a QoS routing scheme in mobile wireless

23 4 sensor networks for biomedical sensor networks. In their research, they investigated the impact of network traffic load and sensor node mobility on the network performance. However, they considered cooperative mwsns. But as aforementioned, a more realistic scenario would require consideration of situations which some nodes do not cooperate with others. Most routing or packet forwarding schemes in the previous literature assume that nodes function properly, are trustworthy and cooperative. However, in realistic scenarios, nodes may fail to cooperate in the network due to node battery depletion, malfunctioning or simply misbehave for unknown reasons. The most important task of biomedical sensor networks is to ensure that data delivered to the medical center or the destination node. Reputation and trust systems have proven to be useful for detecting misbehaving nodes (faulty or malicious) and for assisting the decision-making process. Reputation systems have been widely studied in the context of several diverse domain such as such as ebay (Resnick, and Zeckhauser, 2000), Yahoo auctions (Resnick et al., 2000), and Internet-based systems such as Keynote (Blaze at al., 1996), maintain reputation metrics at a centralized trusted authority. Some research designed reputation systems for ad-hoc networks i.e., Confidant (Buchegger, and Boudec, 2002) and Core (Michiardi, and Molva, 2002), etc. These systems are distributed and also maintain a statistical representation by borrowing tools from the realms of game theory. These systems try to counter selfish routing misbehavior of nodes by enforcing nodes to cooperate with each other. More recently, reputation systems were proposed in the domain of ad-hoc networks that formulate the problem based on Bayesian analytics rather than game theory (Buchegger, and Boudec, 2003a, 2003b). These systems can counter any arbitrary misbehavior of nodes. There are some works in the area of reputation and trust

24 5 systems for WSNs (Ganerial and Srivastava, 2004; Chen, 2007). Their schemes, a sensor node continuously builds a reputation value for other nodes by monitoring their behavior. Then the sensor node uses this reputation value to evaluate the trustworthiness of other nodes. Tanachaiwiwat, Dave, Bhindwale and Helmy, (2003) propose a mechanism of location-centric isolation of misbehavior and trust routing in energy constrained sensor networks. In their trust model, the trust worthiness value is derived from the capacity of the cryptography availability and packet forwarding. Ganerial and Srivastava, (2004) proposed a reputation based framework for sensor networks (RFSN) based on beliefs. Josang and Knapskog, (1998) in order to derive reputation values where each sensor node develops a reputation for each other node by making direct observations about these other nodes in the neighborhood. Reputation is represented through a Bayesian formulation, more specifically, a beta reputation system and used to help a node evaluate the trustworthiness of other sensor nodes, then, make decisions within the network. Furthermore, the statistical foundations of RFSN algorithm can be reduced to a few basic mathematical operations of addition, subtraction, multiplication and division. So, RFSN can run on resource constrained devices and available as a middleware service on Motes. For these reasons, this research aims to handle routing in non-cooperative biomedical mwsns using a scalable routing mechanism for mwsns as reinforcement learning scheme and integrate with reputation and trust system for detecting and screening for malicious node behavior in mwsns. We also study the effect of mobility, the quantity of malicious nodes and quality-of-service requirements. We finally propose a good optimal routing strategy in mwsns which can handle mobility, malicious and end-to-end delay requirement conditions.

25 6 1.2 Research Objectives 1. To study the effects of RL algorithm on the routing performance in mwsns. 2. To apply reputation and trust systems to solve the routing problem in mwsns and compare with the existing routing algorithm. 3. To study the performance of QoS routing in mwsns. 1.3 Research Hypothesis 1. RL can provide good routing solution in mwsns. 2. Some sensor nodes are uncooperative due to various reasons such as node battery depletion, malfunctioning or simply misbehave for unknown reason. 3. Reputation and trust can avoid misbehaving nodes in mwsns. 1.4 Basic Agreements 1. Visual C++ was used to simulate the routing protocols in mwsns. 2. Some data in the experiments were normalized to facilitate analysis and obtain a conclusion. 1.5 Scope and Limitation 1. RL methods were studied to find a good routing strategy in mwsns. 2. Reputation and trust were studied and applied to RL algorithm in mwsns. Results were compared result with the existing RL-QRP algorithm. 3. Simulations were carried out by Visual C++. The experiment results were analyzed to find a suitable routing strategy for biomedical mwsns.

26 7 1.6 Research Methodology Progressions 1. Review of literature and related theories. 2. Study the existing routing methodologies in mwsns and their performance. 3. Test the proposed reputation and trust systems with RL algorithm by simulation using Visual C++ to solve routing problems in mwsns. 4. Analyze and conclude results. 5. Prepare publication. 6. Write thesis Research Methodology Objective 1: To study routing problems in mwsns. 1. Review literature and related works about routing in mwsns. 2. Determine the advantages and disadvantages of the routing methods chosen as benchmark for this thesis. 3. Apply simulation tools such as Visual C++ to evaluate routing mwsns under various scenarios. 4. Design experiment scenarios evaluate an existing routing algorithm (Xuedong, Balasingham, and Byun, 2008) which used a reinforcement learning method called RL-QRP to find the route. 5. Under various network scenarios, we measured the following parameters to evaluate the performance of RL-QRP in terms of the average success ratio, the average end-to-end delay and number of discovered path for each path length.

27 8 Objective 2: To apply reputation and trust systems with RL-QRP to solve the misbehaving nodes routing problem in mwsns and compare with the original RL-QRP. 1. Survey reputation and trust methods. 2. Add malicious nodes into RL-QRP algorithm. 3. Apply the reputation and trust method to the RL-QRP algorithm. 4. Compare the results with the original RL-QRP algorithm by considering the following parameters, the average success ratio, the average end-toend delay and number of discovered path for each path length. 5. Add QoS condition in terms of end-to-end delay requirement to the network and compare the results with original QRT and RL-QRP algorithms by considering the following parameters, the average success ratio, the average end-toend delay and number of discovered path under different end-to-end delay requirements Research Location 1. Wireless Communication Research and Laboratory, Factory Building 4 (F4), 111 University Avenue, Muang District, Nakhon Ratchasima 30000, Thailand. 2. Centre for Dynamic Intelligent Communications (CIDCOM) within the Department of Electric and Electrical Engineering, Strathclyde University, Royal College Building, 204 George Street Glasgow G1 1XW, Scotland Research Equipments 1. Personal Computer 2. Visual C++ software

28 Data Collection 1. Information collected by reviewing literature and related works. 2. Data collected from Visual C++ simulations Data Analysis The simulation collected data from the sensor nodes were analyzed, compared and concluded in terms of graphs and tables. 1.7 Expected Benefit 1. A suitable routing strategy for mwsns which contain misbehaving nodes. 2. Improved routing reliability in mwsns. 1.8 Organization of Thesis The remainder of this thesis is organized as follows. Chapter 2 presents the theoretical background which underlies the contribution of this thesis. Firstly, an introduction of related works followed by the introduction of Markov decision process theory, reinforcement learning (RL) and Q learning. Finally, the basic theory of reputation and trust which are integrated with the RL process to enhance routing mwsns including malicious node is presented in this thesis. In the first part of Chapter 3, we studied the existing algorithm RL-QRP and formulated of reputation and trust to evaluate the routing performance in mwsns under various mobility and malicious nodes conditions. The proposed algorithm which integrates RL-QRP with reputation and trust called QRT and the original RL- QRP were compared in terms of the average success ratio and the average end-to-end delay. The routing performance results were evaluated and compared between the RL- QRP and QRT algorithm under different conditions of malicious node behavior, mobility and end-to-end delay requirements.

29 10 Chapter 4 summarizes all findings and original contribution in this thesis and points out possible future research directions.

30 CHAPTER II BACKGROUND THEORY 2.1 Introduction This thesis proposed a reinforcement learning based routing mechanism in biomedical mobile wireless sensor networks using trust and reputation. A wireless sensor network (WSN) is a network of small devices, called sensor nodes that are embedded in the real world to collect measurement of the interest. There are numerous applications for wireless sensor networks, e.g., battlefield surveillance, medical care, wildlife monitoring and disaster response. In this research, we are interested in biomedical wireless sensor networks to measure parameters such as body temperature, blood pressure, electrocardiogram (ECG), pulse oximeters (SpO2) and heart rate, are sensed at a patient and transmitted to a base station at a medical center. The main function of biomedical sensor networks is to ensure that data packets can be sensed and delivered to the medical center reliably and efficiently. Thus, routing protocol plays an important role in the communication stacks and has significant impact on the network performance. However, some sensor nodes may do not cooperate which each other. Nodes may drop packets they receive due to node battery depletion, malfunctioning or simply misbehave for unknown reasons. Therefore, the main focus on this thesis is to solve the routing problem for non-cooperative mwsns based on RL by incorporating a reputation and trust mechanism.

31 12 Reinforcement learning (Sutton and Barto, 1998) is the study of how animals or machines can learn to optimize their behavior to obtain rewards and to avoid punishments. This learning scheme can permit a decision maker to learn its optimal decisions (actions) through series of trial-and-error interactions with a dynamic environment. Its main idea is to reinforce good behaviors of the decision maker while discouraging bad behaviors through a scalar reward value returned by the environment. RL relies on the assumption that the dynamics of the system satisfies a Markov decision process (MDP). Q-learning (Watkins, 1989) is a reinforcement learning technique that approximates the optimal action-value function which is a function that gives the expected reward for taking a given action in a given state and following a fixed policy thereafter. One of the strengths of Q-learning is that it is able to compare the expected utility of the available actions without requiring a model of the environment. Reputation and trust systems are widely used in diverse domains. E-commerce systems, such as ebay (Resnick and Zeckhauser 2000), Yahoo auctions (Resnick et al. 2000). These systems try to counter selfish routing misbehavior of nodes by enforcing nodes to cooperate with each other. Therefore, this chapter introduces the basic theory of reputation and trust systems and theory behind reinforcement learning. It also serves as an introduction to Q-learning algorithm which is the basis of this thesis. The next section provides a background theory of Markov decision process (MDP), followed by the birth-death process, reinforcement learning (RL) and reputation and trust process. A summary is presented in the final section.

32 Markov Decision Process Theory Markov decision processes (MDPs) is a model of a decision-maker interacting synchronously with the environment. Since the decision-maker sees the environment s true state, it is referred as a completely observable Markov decision process. The basis of Markov decision process is presented as follows Markov Property Markov property refers to the memory-less property of a stochastic process. A stochastic process has the Markov property if the conditional probability distribution of future states of the process depends only upon the present state, not on the sequence of events that preceded it. A process with this property is called a Markov process. The Markov property states that anything that has happened so far can be summarized by the current state S t. Therefore, the probability of being in the next state at time t+1 based on the past history of state changes can be defined simply as the conditional probability based on the current state at time t by; P( S s S s,..., S s ) P( S s S s ). (2.1) t1 t1 t t 0 0 t1 t1 t t This equation is referred to as the Markov property. In other words, a stochastic process has Markov property if the probability distribution of future states of the process time t+1, given the present state at time t and all past states, depends only upon the present state and not on any past states.

33 Markov Decision Process The probability that the process chooses s' as its new state is influenced by the chosen action. Specifically, it is given by the state transition probability function. Thus, the next state s' depends on the current state s and the decision maker's action a. But given s and a, it is conditionally independent of all previous states and actions. In other words, the state transitions of an MDP possess the Markov property. This state transition probability function equation is defined by; P( s s, a) P( S s S s, a a). (2.2) t1 t t Similarly, given any current state and action, s and a, together with any next state, s', the expected value of the incurred reward is; R( s, a, s) E[ r S s, a a, S s] (2.3) t1 t t t 1 where E[.] is the expectation operator and rt 1 is the reward received at time t 1. Equation (2.2) and (2.3), completely specify the most important aspects of the dynamics of the MDP. The simulation programming requires the exact knowledge of these two functions in order to determine the optimal policy. A MDP model can be shown in Fig Figure 2.1 A MDP model.

34 15 A Markov decision process is a 4-tuple (S, A, P, R) which can describe the MDP characteristics, where S denotes the set of states, A is a finite set of actions, P is the probability that action a in state s at time t will lead to state s' at time t + 1, R is the immediate reward (or expected immediate reward) received after transition to state s' from state s after having taken action a A. Let P( s s, a) P be the state transitioning model that denotes the probability of transiting to the next state s S after an agent takes action a A at the current state s S Policy A policy, is a description of the behavior of a decision-maker, or a function mapping states to actions, : S A. There are two types of policies. A stationary policy is a situation-action mapping, i.e., it specifies an action to be taken at each state. The choice of action depends only on the state and is independent of the time step. A non-stationary policy, on the other hand, is a sequence of situation-action mappings, indexed by time. In this thesis, we focus on stationary policies since our data acquisition problem is based on models of sensor readings which are obtained in a particular time frame, such as in the mornings, afternoons, etc. Hence, within such period, the model maybe considered stationary hence the policy is also assumed stationary. The objective of solving a MDP is to find a policy,, defined as a mapping of the state space to the action space, : S P[ A], where P[A] is the distribution over the action space. The action-value function Q ( s, a) of a given policy associates a state-action pair ( sa, ) with an expected reward for performing t action a in state s at time step t and policy.

35 16 To achieve this objective, particularly in scenarios where the dynamics of the environment is difficult to model (such as in mwsns), a technique called reinforcement learning can be used to solve MDPs. 2.3 Reinforcement Learning Reinforcement learning (RL) is a computational approach which is concerned with how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward. In machine learning, the environment is typically formulated as a Markov decision process (MDP), and many reinforcement learning algorithms for this context are highly related to dynamic programming techniques. The main difference from these classical techniques is that reinforcement learning algorithms do not need the knowledge of the MDP and they target large MDPs where exact methods become infeasible. The learner is not taught which action to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trial-and-error interactions with its environment (Sutton and Barto, 1998). A reinforcement learning agent interacts with its environment in discrete time steps. At each time t, the agent receives an observation, which typically includes the reward r t. It then chooses an action a t from the set of actions available. The environment then moves to a new state s t+1 and the reward r t+1 associated with the transition (s t, a t, s t+1 ) is determined. The goal of a reinforcement learning agent is to collect as much reward as possible. Figure 2.3 shows the agent-environment interaction in reinforcement learning.

36 17 Figure 2.2 Diagram of agent-environment interaction in reinforcement learning The Value Function Define the value function V () s of a policy π by; V ( s) E R s s t t E r s s k tk 1 t, (2.4) k 0 2 k where R r r r... r is the expected discounted return of the t t1 t2 t3 tk1 k0 agent, is the discount factor which 0 1 and E [] is the expectation operator under policy. Similarly, the action-value function Q ( s, a) of a given policy associates a state-action pair ( sa, ) with an expected reward for performing action a in state s at time step t and following thereafter; t Q ( s, a) E R s s, a a t t t t E r s s a a tk 1 t, t. (2.5) k 0

37 The Optimal Value Function Solving a reinforcement learning task means, roughly, finding a policy that achieves the maximum reward over the long run. The optimal value function denoted as V () s which is defined as the maximum state value function over all possible policies, at state s. V ( s) max V ( s). (2.6) Optimal policies also share the same optimal action-value function, denoted Q ( s), and defined by; Q ( s) max Q ( s, a). (2.7) The standard solution to the problem above is through an iterative search method (Puterman 1994) that searches for a fixed point of the following Bellman equation; V ( s) max Rt P( s s, a) V ( s). (2.8) a s The equation (2.9) is a form of the Bellman optimality equation for V () s. The Bellman optimality equation for Q () s is; Q( s) Rt P( s s, a) max Q ( s, a). (2.9) s a

38 Q-learning Q-learning is a reinforcement learning technique that works by learning an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy thereafter. One of the strengths of Q-learning is that it is able to compare the expected utility of the available actions without requiring a model of the environment. Q-learning (Sutton and Barto, 1998) defines a learning method within a MDP that is employed in single-agent RL systems. Q-learning is an algorithm that does not need a model of the environment and can directly approximate the optimal action-value function (Q-value) through online learning. Assume that the learning agent exists in an environment described by some set of possible states s S. It can perform any of the possible actions a A. The interaction between the agent and the environment at each instant consists of the following sequence; The agent senses the state s S. t Based on s t, the agent performs an action a t A. As a result, the environment makes a transition to the new state s s S t1. The agent receives a real-valued reward (payoff) r t that indicates the immediate reward value of this state-action transition. The task of the agent is to learn a policy, : S A, for selecting its next action a ( s ) Q t t based only on the current state s t. For a policy, the Q-value ( s, a) (or state-action value) is the expected discounted cost for executing action a at state s and then following policy thereafter. The optimal policy * () s is the policy that maximizes the total expected discount reward which received over an

39 20 infinite time. The Q-learning process tries to find * * Q ( s, a) Q ( s, a) in a recursive manner using available information s a s a r where s t and ' ' ( t, t,,, t ) ' s are the states at time t and t 1 respectively, a t and ' a are the actions at time t and t 1, respectively, and r t is the immediate reward due to a t. The Q-learning rule at time step t 1 is given by; Q ' ' t1 ( st, at ) (1 ) Qt ( st, at ) rt max Qt ( s, a ) a' (2.10) where 0 1 is a discount factor, 0 1 is the learning rate and Q ( s', a ') is the action-value function for next state ' s and next action a '. t Exploration One of the most important issues for Q-learning algorithm is maintaining a balance between exploration and exploitation. Normally, the convergence theorem of Q-learning requires that all state-action pairs ( sa, ) are tried infinitely (Sutton and Barto, 1998). Such a balanced condition is satisfied by selecting a good action according to some probability and exploring new actions, otherwise. Note that is the probability that a greedy action is selected i.e.; a* arg max Q( s, a). (2.11) a A This probability termed greedy, significantly speeds up the convergence of the Q-value function. If the Q-value of each admissible ( sa, ) pair is visited infinitely often, and if the learning rate is decreased to zero in suitable way, then as

40 21 t, Qt ( s, a ) converges to The optimal policy is defined by; Q * ( s, a) with probability 1 (Sutton and Barto, 1998). * * ( s) arg max Q ( s, a). (2.12) aa( s) 2.5 Trust and Reputation In this section, we describe techniques for estimating a reputation θ based on transactional data. A transaction occurs whenever two nodes make an exchange of information or participate in collaborative process. With each exchange, the nodes generate ratings indicating the degree of cooperation of their partner node. For the moment, we consider reputation θ representing the probability that a given node will cooperate when asked to exchange information. Therefore, our reputations θ are contained in the unit interval [0,1], and values of θ closer to one suggest greater cooperation. In the next two section, we discuss a Bayesian framework for updating reputations given the rating from each new transaction. Within this section we address the following topics: representation of reputation update with new transactions and a trust metric as output of the reputation Representation and Update: Binary Ratings. Suppose a transaction occurs between node i and j. Depending on the outcome, the node i will assign the value 1 if node j was cooperative and 0 otherwise. Node i will then update its reputation for node j, incorporating this new data. Independently, node j will create its own rating for the exchange and update its opinion of node i. For simplicity, we will focus on the computations carried out by node i with the understanding that each node in the network will perform similar operations after it completes a transaction.

41 22 Let θ denote the reputation of node j held by node i. We adopt a classical betabinomial framework for estimating reputations (Gelman et al.2003; Josang and Ismail 2002). Specifically, we assign to θ a prior distribution p(θ) that reflects our uncertainty about the behavior of node j before any transactions with i take place. We will take p(θ) from the beta family, a two-parameter class of distributions which can expressed as; ( ) ( ) ( ) ( ) ( ) (2.13) For some choice of α and β, where ( ) is the gamma function (Gelman et al. 2003). The mean of a beta distribution with parameter ( ) is ( ) and its variance is ( ) ( ). The beta is chosen, in part, because of its flexible and ability to peak at any value in the interval [0,1] with arbitrarily small variance (Gelman et al. 2003). Given θ we then model our binary rating as Bernoulli observations with success probability θ. That is, let denote node i s rating of node j for a single transaction. Then, given j s reputation θ, the probability that node j will be cooperative is; ( ) ( ) (2.14) Once the transaction is complete, we update our reputation using the posterior distribution for θ; ( ) ( ) ( ) ( ) ( ) ( ) ( ) (2.15)

42 23 In our case, these expressions become; ( ) ( ) ( ) (2.16) which means the posterior ( ) again has a beta distribution with parameters and. The utility of the choice of a beta distribution is now clear because of its relationship with the Bernoulli (binomial) distribution; the beta distribution is the conjugate prior for the bernoulli distribution. Therefore, our reputation framework requires node i to maintain only two parameters to describe the reputation of node j with very simple update rules as each new transaction occurs. Suppose nodes i and j now conduct n transactions with rating. Repeating the updates in the previous paragraph, we find that the posterior distribution for θ after n transactions is again beta with parameters updated as follows; (2.17) Therefore, after n transactions, the posterior mean of θ is ( ), (2.18)

43 24 where ( ) ( ) is a probability that tends to zero as. This form of the updates shows clearly that we are doing a weighted average of the prior mean and the mean of the new observations. The weight on the prior mean goes to zero as the number of new observations grows very large Reputation and Update: Interval Rating. Now we describe an update for rating that are not measured on a binary scale but instead are assigned some value in [0,1]. We can think of these rating as estimated probabilities, perhaps for the event that a particular data point exchanged between i and j is faulty. Note that the notion of estimated probabilities is much more consistent than binary ratings. In this context, we appeal to a slightly more elaborate framework involving Dirichlet processes (Ferguson 1973). Let ( ) be a Dirichlet process with base measure and let this be our prior distribution. Given observations, (Ferguson 1973) tells us that posterior is again a Dirichlet process with base measure ( ) ( ), where I is an indicator of a point mass at the location of the observation. As we will describe in section 2.5.3, we are ultimately interested in the posterior trust, i.e. the posterior mean of the reputation distribution. When the prior mean is given by, the posterior mean of the posterior mean of the Dirichlet process is given by; ( ) (2.19) where ( ) ( ( )) tends to zero as and ( ) ( ) is mean of the base measure. Suppose we take ( ). Then we have;

44 25 (2.20) which, even though we now are dealing with real-valued observations on the interval [0,1], gives the same weights as in section 2.5.1, where we had binary cooperativeness rating. In fact, in order to match not just the weight but also the prior mean, we could take our measure to be ( ) ( ) and get exactly the same updating as in Equation 2.18 with real-valued variables instead of binary variables. Once we have seen that the update is of a generalizable form using the Dirichlet Process, we can also see that update using binary rating in section can also derived within this framework. If we let the measure, which would suggest our data are binary, then the update for the mean is again exactly Equation We can now see that this justification is a very general one. Following from this discussion, in order to maintain our two parameters in a way so that we correctly update the posterior mean, we replace the bayesian update step with an identical bookkeeping step. After a single transaction, if the assigned probability of cooperativeness were, the beta parameter updates would be; (2.21) Trust The main objective of the reputation block is to expose as output metric that can be used as a representative of the subjective expectation of the other node s future behavior. Up until now we have represented i s reputation of node j

45 26 with θ, but from here on we represent it with to make the pairwise reputations more explicit. Given a reputation metric, we define the trust metric as node i s prediction of the expected future behavior of node j. is obtained by taking a statistical expectation of this prediction; [ ] [ ( ] (2.22) This trust metric can be used by a node in several ways. Some notable ones are: (1) Data Fusion: can be used as a weight for a data reading reported by node j. The data fusion can be then performed on these weighted data readings, thereby reducing the impact of untrustworthy nodes. (2) Node revocation: The evolution of trust over time provides an on-line tool to the end-user to detect compromised or faulty nodes. This can help the end-user to take appropriate countermeasure such as replacing the misbehaving node or sensor. (3) Decentralized decision making: In a heterogeneous sensor network, different nodes might be equipped with different capabilities. For example, a few of them might have a more precise temperature sensor or a camera, others may be mobile, etc. Given a requirement of using a particular service from some other node in the network and faced with multiple choices, the value of can be used as a decision making criteria.

46 Summary In this chapter, an overview of Q-learning which is a reinforcement learning method has been introduced. We provided a concise background on theories related to reinforcement learning including the Markov decision process. Furthermore, we also presented an overview of reputation and trust systems. In the next chapter a reinforcement learning based routing in biomedical mobile wireless sensor networks using trust and reputation is presented and its routing performance is compared with an existing algorithm.

47 CHAPTER III RL-BASED ROUTING IN BIOMEDICAL MOBILE WIRELESS SENSOR NETWORKS USING TRUST AND REPUTATION 3.1 Introduction In this chapter, routing issues in biomedical wireless sensor networks are investigated. Parameters such as body temperature, blood pressure heart rate are sensed at a patient and transmitted via intermediate sensor nodes to a base station at a medical center. The data is used for health status monitoring, diagnosis and treatment. For example Z. Pang, Q. Chen, and L. Zheng, (2009), E. Jovanov, C. Poon, Y. Guang- Zhong, and Y.T. Zhang, (2009) proposed the use of wireless sensors to monitor vital signs of patients in hospital and home environments. The most important task of biomedical sensor networks is to ensure that data can be delivered to the medical center reliably and efficiently (R.S.H. Istepanian, E. Jovanov, Y.T. Zhang, 2004). Furthermore, in biomedical sensor networks, patients may be moved to an emergency room, and medical staff may want to know their information continuously. Therefore, use of a mobile wireless sensor network (mwsn) is necessary for biomedical sensors networks. A distributed, lightweight, and highly adaptive routing protocol based on methods such as reinforcement learning (RL) has been proposed for such rapidly changing wireless network conditions (E. Gelenbe and M. Gellman, 2007), (L. Xuedong, I. Balasingham, and S.S. Byun, 2008).

48 29 RL is a technique that has been used to support routing in dynamic topology networks. RL is a study of how artificial systems can learn to optimize their behavior by using its experience through rewards and punishments. There are some works which applied RL to solve routing problem in static WSNs (A. Forster, A.L. Murphy, J. Schiller, and K. Terfloth, 2008). In (E. Gelenbe and M. Gellman, 2007), the authors proposed a Cognitive Packet Network (CPN) which made routing decisions in presence of routing oscillations using RL and a neural network model. Ref. (L. Xuedong, I. Balasingham, and S.S. Byun, 2008) proposed RL-QRP, a RL-based routing protocol with routing scheme in mwsns. They investigated the impact of network traffic load and sensor node mobility on the network performance. However, their results were based on the assumption that all nodes cooperated in the packet forwarding process. But a more realistic scenario would require consideration of situation which some nodes do not cooperate with each other (i.e., by dropping packets they receive) either due to node battery depletion, malfunctioning or simply misbehaving for unknown reason (U. Vashney, 2008). Since in biomedical sensor networks, data packets must be delivered to its destination node reliably, means to identify and avoid these malicious nodes are necessary (D. He, C. Chen, S. Chan, J. Bu, and A. Vasilakos, 2012). Reputation and trust schemes have been used to identify well-behaved and malicious nodes in WSNs (D. He, C. Chen, S. Chan, J. Bu, and A. Vasilakos, 2012), (H. Yu, Z. Shen, C. Miao, C. Leung, and D. Niyato, 2010). In such schemes, a sensor node continuously builds a reputation value for other nodes by monitoring their behavior. Then the sensor node uses this reputation value to evaluate the trustworthiness of other nodes. Ref. D. He, C. Chen, S. Chan, J. Bu, and A. Vasilakos,

49 30 (2012) proposed a trust scheme called ReTrust for medical WSNs which is lightweight and attack-resistant. High malicious node detection rates and average packet delivery ratio were achieved via simulation and experimental test-bed. However, sensor node mobility was not explicitly addressed. Therefore, the objective of this chapter is to solve the routing problem for noncooperative mwsns based on RL by incorporating a reputation and trust mechanism which screens out nodes with malicious behavior using values of reputation and trust values maintained at the sensor nodes. We compared its performance with an existing reinforcement learning routing scheme called RL-QRP (L. Xuedong, I. Balasingham, and S.S. Byun, 2008) under various mobility and malicious node scenarios. 3.2 RL-QRP Reinforcement Learning based Routing Protocol with QoS Support for Biomedical Sensor Networks (RL-QRP) has been proposed for promote routing policies to find optimal path through experience and rewards (L. Xuedong, I. Balasingham, and S.S. Byun, 2008). They used Q-learning which learns the value of function ( ) to find an optimal decision policy. In each time action is selected, the agent receives an immediate reward from the environment. Then the agent will use this reward to update the one step rule as follows; ( ) ( ) ( ) [ ( )] (3.1) where the Q-value, ( ) denotes the quality of action at state, is the learning rate and is the discount factor ( ) denotes the expectation future reward at state

50 31 by taking action. The updated Q-values then in turn affect the future decisions of the agent. RL-QRP requires the use of location information parameters to calculate a reward following a particular action. Therefore, the protocol can find the shortest path from a beginning node to a destination node using a reward function given by; ( ( ) ) ( ) { (3.2) where and is the distance between node, and destination node, respectively, is the distance between node and node, is the endto-end delay requirement encapsulated in the data packet. is the experience delay between node and. Sa Sink Node Sn Sd a1 a4 a2 Sj Sz Si a3 So Sc Figure 3.1 RL-QRP routing model

51 32 The basic idea of RL-QRP follows Figure 3.1. Each node in the biomedical sensor network is considered as a state belonging to set S = { }, = 1,2,,N where N is the number of sensor nodes. For each node with a neighbor, an action can be selected from A = { ( )}. Note that ( ) refers to a packet being forwarded from state to, provided that and are within each s other communication range. Suppose that node in Figure 3.1 must forward a packet to the sink node through some intermediate node. then checks the Q-value of its neighboring nodes which include. Then node forwards the packet to the neighbor node with the highest Q-value. Suppose that forwards the packet to node. After that node updates its Q-value ( ( )) according to (3.1) with reward in (3.2). The process is repeated for node and the following consecutive nodes until the packet reaches the sink node. Thus, the nodes can find the optimal route through experience and rewards without complicated prediction techniques, or explicitly frequently updating. Therefore, this process is well-suited for dynamic topologies. 3.3 Reputation Reputation and trust systems have been proved useful mechanisms to address the threat of compromised or faulted entities. Such systems are operated by identifying selfish peers and excluding these entities from the networks. Ref. S. Buchegger and J.- Y. Le Boudec, (2002) considered routing protocols in MANETS by using both first hand and second hand information for updating reputation values. Ref. S. Ganeriwal and M. B. Srivastava, (2004) and D. He, C. Chen, S. Chan, J. Bu, and A. Vasilakos, (2012) considered both first and second hand reputation and trust-based models

52 33 developed exclusively for sensor networks. In D. He, C. Chen, S. Chan, J. Bu, and A. Vasilakos, (2012), a two-tier architecture trust management scheme was proposed in which a master node was used to compute the trust values for sensor nodes within its range. In (S. Ganeriwal and M. B. Srivastava, 2004), a watchdog mechanism was used to build their trust rating system. Given a reputation value obtained from the watchdog, the trust metric based on BETA distribution (H. Yu, Z. Shen, C. Miao, C. Leung, D. Niyato, 2010) can be computed by; [ ] (3.3) where refers to node s prediction of the expected future behavior of positive outcomes of node, are the number of positive and negative outcomes of a specific event, respectively. is refer to a reputation metric. In particular, and are the number of successes and failures in forwarding packets between two nodes, respectively. The first hand or direct reputation value can be determined from which is the direct observation of node (the observed node) experienced by node. From figure 3.1, suppose that node prefers to forward the data packet to the destination node by the shortest path via node and. In effect, an interaction occurs between node and node. We used a simple reputation binary rating scheme, where a successful outcome ( ) is incremented if node forwards the packet to node and a failed outcome ( ) is incremented if node does not forward the packet to node. Note that typically so that the trust value is normalized to the range [ ] and the initial value of trust is 0.5. On the other hand, the indirect reputation value can be determined from direct reputation values of node recommended by its

53 34 neighboring nodes. Although aggregated second hand information (i.e. by inquiring from watchdog the values of of other nodes which interacted with node in the past) helps accelerate the calculation of the reputation value, this chapter considers the first hand observation or direct reputation for the sake of simplicity. Furthermore, drawbacks of indirect reputation include vulnerability to bad-mouthing attacks and that watchdog may not be able to capture all relevant information in the network (H. Yu, Z. Shen, C. Miao, C. Leung, D. Niyato, 2010). 3.4 RL-QRP with Reputation and Trust In this section, RL-based routing integrated with reputation and trust, called QRT, is described. We redefine the state and action and rewards as follows: a) Let Q( ) denote the opinion of about which is updated when node forwards or drops packets to its neighboring node; ( ) {( ) ( ) [ ( ( ))]} (3.4) where the Q-value, ( ), denotes the quality of forwarding packets at node experienced by and denotes the level of trust at node experienced by which is quantized into intervals of 0.1. A trust value which takes values in the range [0,1]. b) State: S = { }, = 1,2,,N where N is the number of sensor nodes. Each node is a state in S.

54 35 c) Trust: is the trust value that quantifies the trustworthiness of in forwarding packets from node that we integtated the original Q-value of RL-QRP algorithm by average between Q-value and trust value. d) Action: { ( )},, Excution of ( ) means that the packet is forwarded from state to, provided that and are within each other s communication range. e) Reward function: is the reward for executing an action at node (e.g. forwards the packet to ) given by; ( ( ) ) ( ) (3.5) Note that we assumed that every node in the network always sends ACK back to its upstream node, regardless of their behavior. and are the distance between node, and the destination node, respectively. is the distance between node and node. is the end-to-end delay requirement encapsuled in the data packet. is the experienced delay between node and. The pseudo code of the proposed QRT routing algorithm is shown in Table 3.1.

55 36 TABLE 3.1 QRT routing algorithm 01 Begin 02 Initialization 03 Set timer for beacon exchange 04 Begin Loop 05 If timer expires 06 Broadcast beacon to immediate neighboring nodes 07 Re-set timer 08 Endif 09 If beacon packets arrives 10 Update neighboring node s position and Q-value 11 Endif 12 If data packet arrives 13 If good node 14 Random number 15 If Random number > ε 16 Select neighboring node with highest Q-value 17 Else 18 Randomly select neighboring node 19 End if 20 Receive reward r 21 Update the Q-value 22 Update Trust 23 Else 24 Drop packets 25 End if 26 Endif 27 Go to End 3.5 Performance Evaluation In this section, we evaluated the proposed QRT routing algorithm which integrated the existing RL-QRP (L. Xuedong, I. Balasingham, and S.S. Byun, 2008) with the reputation and trust scheme. Results were compared with the original RL- QRP and a non-learning threshold reputation scheme. The latter scheme ranked the trust values of the neighboring nodes and selected the next node with the highest trust value above a predetermined threshold of 0.4 which was found to give the best performance among other threshold values. Visual C++ was used to simulate a mwsn

56 37 under various conditions according to Table 3.2 and Table 3.3. A number of nodes in the mwsn were mobile and followed the random way point mobility model which is suitable for modeling user s mobility in a confined area or within the hospital. The velocity was randomly chosen from [0,5] m/s. The remaining nodes were assumed static. These parameters are suitable for biomedical applications, where each node represents a patient who is attached with a health monitor sensor node. Each experiment was repeatedly run with different seeds, each with a runlength of 10 6 events until the sample averaged results were within a 10% range Unconstrained Traffic Demand Initially, we evaluated the routing performance of the algorithms when there is no constrained on the QoS of the traffic demand. This experiment was divided into 2 parts where we considered the cases when the node mobility was varied and when the number of malicious nodes present in the network was varied Part 1 Malicious Nodes Effect In this experiment, there are 9 mobile nodes out of 36 nodes. To study the effect of malicious nodes and the degree to which they misbehave, the number of malicious node was varied from 9 to 18 nodes and their packet dropping probability were varied from 0 to 1. The following metrics were measured:

57 38 TABLE 3.2 Simulation Parameters Parameters Value Part 1 Part 2 Number of sensor nodes 36 Node mobility Random way point Pause time (s) 60 Node velocity (m/s) Min. 0, Max. 5 Area size 200x200m 2 Transmission range 50m Runlength (number of route requests) 10 6 Learning rate (α) for RL-QRP, QRT 0.5 Discount factor (γ) for RL-QRP, QRT 0.5 Number of mobile nodes 9 0,9,18,27,36 Number of malicious nodes 9, 18 9 Probability of dropping a packet 0, 0.25,0.5,0.75, Average success ratio (%) is given by; (3.6) This metric is the proportion of number of successfully discovered paths. Figure 3.2 illustrates the average success ratio for QRT, RL-QRP and threshold schemes as the packet dropping probability was varied. Note that for all packet dropping probabilities, the average success ratio of QRT was up to 11% greater than RL-QRP and up to 25% greater than the threshold scheme. Such result indicated that

58 39 QRT can identify and avoid malicious nodes more effectively than RL-QRP and threshold schemes and thereby discover more paths that can reach the destination node. Average end-to-end delay: In Figure 3.3, the average end-to-end delay is shown against the packet dropping probability. Note that the QRT showed a higher average end-to-end delay than RL-QRP. This was because QRT can discover more paths than the other schemes as shown in the previous figure. In Figure 3.4, such paths included both short paths (2, 3 hops) which was comparable to the RL-QRP, as well as long paths (4 hops up) which was discovered significantly greater than RL-QRP. The threshold scheme discovered the least number of shortest paths of all thus obtaining the highest average end-to-end delay. Average success ratio (%) QRT (9 Mal. Nodes) QRT (18 Mal. Nodes) RL-QRP (9 Mal. Nodes) RL-QRP (18 Mal. Nodes) Threshold (9 Mal. Nodes) Threshold (18 Mal. Nodes) Packet dropping probability Figure 3.2 Average success ratio of discovered paths

59 40 Average end-to-end delay (msec) QRT (9 Mal. Nodes) QRT (18 Mal. Nodes) RL-QRP (9 Mal. Nodes) RL-QRP (18 Mal. Nodes) Threshold (9 Mal. Nodes) Threshold (18 Mal. Nodes) Packet dropping probability Figure 3.3 Average end-to-end delay of discovered paths Part 2 Mobility Effect In this part, the algorithms performance when varying node mobility was investigated. For this scenario, 9 malicious nodes were present, each with a packet dropping probability of Such setting was used because high success ratio were observed for all schemes. Hence, the effect from increased mobility would be more visible. The degree of mobility was varied by increasing the number of moving nodes from 0 (least mobile) to 36 (most mobile).

60 41 6 x hops 3 hops 4 hops up Number of discovered path length QRT RL-QRP Threshold Figure 3.4 Number of discovered paths for each path length for 9 malicious nodes Average success ratio (%): Figure 3.5 illustrates the average success ratio for all schemes. Note that QRT consistently outperformed both RL-QRP and threshold schemes by up to 9% and 22%, respectively. However, the margin between QRT and RL-QRP decreased as mobility increased. Average end-to-end delay: In Figure 3.6, the average end-to-end delay is shown versus the number of moving nodes. Similar to Figure 3.3, the average end-toend delay of QRT was greater than RL-QRP but less than the threshold scheme. This was because, in Figure 3.7, QRT can find more longer paths (4 hops up) than RL-QRP and the threshold scheme, while obtaining a comparable number of short paths (2, 3 hops) to RL-QRP. Furthermore, as the number of discovered paths gradually decreased as mobility increased, QRT consistently discovered more paths than other schemes.

61 42 Average success ratio (%) QRT RL-QRP Threshold Number of moving nodes Figure 3.5 Average success ratio under various degrees of mobility Average end-to-end delay (msec) Number of moving nodes QRT RL-QRP Threshold Figure 3.6 Average end-to-end delay of discovered paths

62 43 under various degrees of mobility 7 x hops 3 hops 4 hops up Number of discovered paths for each path length QRT RL-QRP Threshold QRT RL-QRP Threshold QRT RL-QRP Threshold QRT RL-QRP Threshold QRT RL-QRP Threshold Number of moving nodes Figure 3.7 Average number of discovered path length under various degrees of mobility Traffic Demand with End-to-End Delay QoS In this experiment, there are 9 mobile nodes and 9 malicious nodes present in the 36 node mwsn. To study the impact on the QoS on the network, the end-to-end delay requirement ( ) was varied to 50, 100, 200, 300 msec. The remaining simulation parameters are shown in Table 3.3.

63 44 TABLE 3.3 Simulation Parameters Parameters Value Number of sensor nodes 36 Node mobility Random way point Pause time (s) 60 Node velocity (m/s) Min. 0, Max. 5 Area size 200x200m 2 Transmission range 50m Runlength (number of route requests) 10 6 Learning rate (α) for RL-QRP, QRT 0.5 Discount factor (γ) for RL-QRP, QRT 0.5 Number of mobile nodes 9 Number of malicious nodes 9 Probability of dropping a packet 0, 0.5 End-to-end delay requirement (msec) 50,100, 200, 300 Average success ratio In Figures 3.8 and 3.9, the average success ratio is shown against end-to- end delay requirement ( ). In this experiment, we modified the proposed QRT and the existing RL-QRP to handle different stringent end-to-end delay requirements. In particular, the reward function ( ) was modified by varying accordingly for both algorithms. We thus refer to them as QRT_ reward and RL-QRP_ reward,

64 45 respectively. Furthermore, we also evaluated a more aggressive approach in finding paths to meet the end-to-end delay requirements by allowing the agents in both algorithms to search for next hops only on paths which have the estimated delay so far not exceeding the end-to-end delay requirement. Such modification discovers paths which strictly satisfy the QoS requirement, therefore we refer to them as QRT_strict and RL-QRP_strict, respectively. The value of was varied in the range msec. We considered the cases when the packet dropping probability were 0 and 0.5. From Figures 3.8 and 3.9, we can see that in QRT consistently outperform RL-QRP. In addition, the average success ratio QRT_ reward and RL-QRP_ reward are greater than QRT_strict and RL-QRP_strict. The reason was because QRT _ reward and RLQRP_ reward cannot screen out the paths whose path delay exceed the end-to-end delay requirement as shown in Figures Furthermore, the average success ratio of QRT_strict and RL-QRP_strict decreased as became more stringent because these two methods conservatively filter out paths that have delay more than.

65 Average success ratio (%) QRT strict 20 QRT Tq reward RL-QRP strict RL-QRP Tq reward End-to-end delay requirement (msec) Figure 3.8 Average success ratio under different end-to-end delay requirements and 0 probability of malicious node Average success ratio (%) QRT strict QRT Tq reward RL-QRP strict RL-QRP Tq reward End-to-end delay requirement (msec) Figure 3.9 Average success ratio under different end-to-end delay requirements and 0.5 probability of malicious node

66 47 Average end-to-end delay In Figures 3.10 and 3.11, the average end-to-end delay is shown against the end-to-end delay requirement when the packet dropping probability is 0 and 0.5, respectively. Note that the average end-to-end delay of QRT and RL-QRP are similar. The average end-to-end delay of QRT_strict and RL-QRP_strict strictly satisfy because these schemes select only paths whose delays are not over. However, QRT_ reward and RL-QRP_ reward cannot screen out such paths delays by means of reward modification alone. Average end-to-end delay (msec) QRT strict 40 QRT Tq reward RL-QRP strict RL-QRP Tq reward End-to-end delay requirement (msec) Figure 3.10 Average end-to-end delay under different end-to-end delay requirements and 0 probability of malicious node

67 Average end-to-end delay (msec) QRT strict QRT Tq reward RL-QRP strict RL-QRP Tq reward End-to-end delay requirement (msec) Figure 3.11 Average end-to-end delay under different end-to-end delay requirements and 0.5 probability of malicious node Packet dropping probability of 0, 100 msec QRT_strict RL-QRP_strict QRT_ reward RL-QRP_ Figure 3.12 Number of discovered path under different end-to-end delays reward

68 49 Packet dropping probability of 0.5, msec 100 QRT_strict RL-QRP_strict QRT_ reward RL-QRP_ reward Figure 3.13 Number of discovered path under different end-to-end delays Packet dropping probability of 0, msec 200 QRT_strict RL-QRP_strict QRT_ reward RL-QRP_ Figure 3.14 Number of discovered path under different end-to-end delays reward

69 50 Packet dropping probability of 0.5, 200 QRT_strict RL-QRP_strict QRT_ reward RL-QRP_ reward Figure 3.15 Number of discovered path under different end-to-end delays 3.6 Conclusion We proposed the QRT routing algorithm for non-cooperative mwsns which comprised of malicious stochastic packet dropping nodes. QRT was based on a RL routing method which incorporated a reputation and trust mechanism to screen out malicious nodes. The mechanism employed direct reputation from observed nodes to evaluate their trust values. We compared QRT against RL-QRP and threshold schemes. Results showed that the average success ratio of QRT was 11% and 25% greater than RL-QRP and the heuristic non-learning threshold schemes, respectively. As the mobility of the network increased, QRT consistently outperformed the other algorithms by gaining up to 9% and 22% success ratios above the RL-QRP and threshold schemes. The results suggest that reputation and trust mechanism can be applied to identify and avoid malicious packet dropping nodes mwsns.

70 51 In terms of quality-of-service, the results have shown that QRT consistently outperformed RL-QRP even in presence of high packet dropping probability and stringent end-to-end delay requirements. The results suggest that QRT with reputation and trust mechanism scheme can be applied to cater quality-of-service in mwsns.

71 CHAPTER IV CONCLUSION AND FUTURE WORK 4.1 Conclusion In this thesis, we proposed a routing method called QRT algorithm for noncooperative mwsns based on Reinforcement Learning (RL). In particular, the QRT was integration of a reputation and trust scheme to avoid misbehaving node with an existing RL-based routing protocol called RL-QRP. We evaluate its performance in non-cooperative mwsns under various non-cooperation, mobility and end-to-end delay conditions. The experimental work carried out in this thesis was divided into two parts which were unconstrained and delay-constrained traffic demands. In the first experiment, we varied the number of malicious nodes and the number of mobile nodes to study their impact and compared the results with the original RL-QRP algorithm and a non-learning threshold scheme in terms of average success ratio (%), average end-to-end delay and the number of discovered path length. In the subsequent experiment, we then extended the framework to consider the delay-constrained quality-of-service into our simulation. We considered for 2 types of modification, including QRT_strict and QRT_ reward and also compare the results with the same modifications on RL-QRP in terms of average success ratio (%), average end-toend delay and the number of discovered paths under different end-to-end delay requirements. These two parts were presented in Chapter 3. The original contributions and findings in this thesis can be summarized as follows.

72 QRT The first condition was the proposed QRT scheme which has that Q-learning algorithm can be applied to promote routing in mwsns which include misbehaving nodes. We extended the state space which originally consisted of only the neighboring nodes of an agent to included quantized trust levels of their neighbors as well. We also modified the Q value updating equation (3.4) by adding as an additional reward term which reflected the trust between nodes, in order to take account of the trust level between nodes. Performance comparison was made with an existing RL-QRP algorithm and the threshold scheme. The simulation in the first part varied the number of malicious nodes along with the packet dropping probability of a malicious node. In the second part, the simulation varied the number of mobility nodes. The proposed experiment results showed that the QRT method consistently outperformed RL-QRP and the threshold scheme in terms of success ratio when varying the number of malicious node and achieved up to 11% and 25%, respectively more than the two schemes. QRT method also discovered more longer paths than other schemes. When the number of mobility node increased, QRT gained up to 9% and 22% or success ratio over the RL-QRP algorithm and the threshold scheme, respectively Quality-of-Service The purpose of this section was to add quality-of-service in terms of end-to-end delay requirement into our simulation. In the first part of this study, we modified the end-to-end delay requirement or value in the Q learning equation. Then, the results showed that varying alone cannot screen out path which had end-

73 54 to-end delay more than. An alternative approach was then trialed which selected next hop nodes whose path delay so far has not yet exceeded. The results suggested that QRT performed well in scenarios where end-to-end delay quality-ofservice was required by the traffic demands even in the presence of malicious nodes, achieving up to 11% success ratio than RL-QRP. The significance of our work was focused on proposing means to enhance routing in the presence of misbehaving nodes in mwsns. We studied the effects of mobility and different degrees of malicious node behavior. Moreover, we added quality-of-service into the experiment for a more realistic biomedical application scenario using mwsns. We can conclude that QRT approach can obtain the better routing performance than RL-QRP and the threshold scheme detecting and avoiding malicious nodes in mwsns under various conditions of packet dropping probability, node mobility and stringent end-to-end delay requirements. 4.2 Future Work mwsns with Indirect Reputation Value To study the effect of indirect reputation value which is the opinion about the next node by other neighbor nodes (for example, node i considers forwarding a packet to node j, then node i will get the opinion about node j by node k to evaluate trustworthiness of nodes j), Srinivasan, and Teitelbaum, (2006) proposed the distributed reputation-based beacon trust system (DRBTS) which used both direct reputation and indirect reputation based on beta distribution to weight reward for decision making of the node in choosing the next node. A possible direction for future extension of this thesis is therefore to include indirect reputation in the framework.

74 Traffic Priority In biomedical mobile wireless sensor networks, there is a great variety of health information and the significance of each information is different. Giving priority to important information such as heart rate over delay tolerable traffic such as temperature msec by reserving short routes for only important information to avoid packet collision and over buffering. Hence, traffic and route prioritization are promising directions for further study Performance Evaluation of Test Bed The main objective of this thesis was to improve routing performance in mwsns by using RL with trust and reputation. This experiment was simulated in Visual C++ environment to perform the learning process and evaluate algorithms. Therefore, an important future direction is to extend towards real data collection for training the learning algorithm in actual mwsns mwsns with Energy Consumption Condition Energy consumption in mwsns is one of the most important issues. Dealing with the energy problems in mwsns by expanding the state space of remaining battery of each node and making energy-aware routing decisions at intermediate nodes along the route warrants further investigation.

75 REFERENCES Aghaei, R., Rahman, A., Gueaieb, W., Saddik, A. (2007). Ant Colony-Based Reinforcement Learning Algorithm for Routing in Wireless Sensor Networks. Proceedings of Instrumentation and Measurement Technology Conference. Blaze, M., Feigenbaum, J., Lacy, J. (1996). Decentralized Trust Management. Proceedings of Security and Privacy. Buchegger, S., Boudec. J.Y. (2002). Performance Analysis of the CONFIDANT Protocol (Cooperation of Nodes-Fairness in Dynamic Ad-hoc NeTworks), Proceedings of The Third ACM International Symposium on Mobile Ad Hoc Networking and Computing. Buchegger, S., Boudec. J.Y.L. (2003a). Coping with False Accusations in Misbehavior Reputation Systems for Mobile in Ad-Hoc Networks. Technical Report IC, 2003, 31, EPFL-DI-ICA. Buchegger, S., Boudec. J.Y.L. (2003b). The Effect of Rumor Spreading in Reputation Systems for Mobile Ad-hoc Networks. Proceedings of Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks. Chen, H., Wu, H., Zhou, X., Gao, C. (2007). Reputation-based Trust in Wireless Sensor Networks, Proceedings of Multimedia and Ubiquitous Engineering. Dong, S., Agrawal, P., Sivalingam, K. (2007). Reinforcement Learning Based Geographic Routing Protocol for UWB Wireless Sensor Network. Proceedings of Global Telecommunications Conference.

76 57 Forster, A., Murphy, A.L. (2007). Exploiting Reinforcement Learning for Multiple Sink Routing in WSNs. Proceedings of National Competence Center in Research on Mobile Information and Communication Systems. Forster, A., Murphy, A.L., Schiller, J., Terfloth, K. (2008). An Efficient Implementation of Reinforcement Learning Based Routing on Real WSN Hardware, Proceedings of International Conference on Wireless and Mobile Computing. Ganeriwal, S., Srivastava, M. B. (2004). Reputation based Framework for High Integrity Sensor Networks, Proceedings of Security of Ad Hoc and Sensor Networks. Gelenbe, E., Gellman, M. (2007). Oscillations in a Bio-Inspired Routing Algorithm, Proceedings of Mobile Ad Hoc and Sensor Systems. He, D., Chen, C., Chan, S., Bu, J., Vasilakos, A. (2012). ReTrust: Attack-resistant and Lightweight Trust Management for Medical Sensor Networks, Journal of Information Technology in Biomedicine, vol. 16, no. 4, pp Istepanian, R.S.H., Jovanov, E., Zhang, Y.T. (2004). Guest Editorial Introduction to the Special Section on M-Health: Beyond Seamless Mobility and Global Wireless Health-Care Connectivity, Journal of Information Technology in Biomedicine, vol. 8, no.4, pp Josang, A., Knapskog, S.J. (1998). A Metric for Trust Systems, Proceedings of The 21 st National Information Systems Security Conference. Jovanov, E., Poon, C., Guang-Zhong, Y., Zhang, Y.T. (2009). Guest Editorial Body Sensor Networks: From Theory to Emerging Applications. Journal of Information Technology in Biomedicine, vol. 13, no. 6, pp

77 58 Kaelbling, L.P., Littman, M.L., Moore, A.P. (1996). Reinforcement Learning: A survey. Journal of Artificial Intelligence Research, vol. 4, pp Karaki, J.N., Kamal, A.E., (2004). Routing Techniques in Wireless Sensor Networks: a Survey. Journal of Wireless Communications, vol. 11, no. 4, pp Kim, K., Kim, H. Hong, Y. (2009). A Self Locolization Scheme for Mobile Wireless Sensor Networks., Proceedings of Computer Sciences and Convergence Information Technology. Kim, K., Lee, I.S., Yoon, M., Kim, J., Lee, H., Han, K. (2009). An Efficient Routing Protocol Based on Position Information in Mobile Wireless Area Body Sensor Network. Proceedings of Networks and Communications. Lan Tien Nguyen, Defago, X., Beuran, R., Shinoda, Y. (2008). An Energy Efficient Routing Scheme for Mobile Wireless Sensor Networks. Proceedings of Wireless Communication Systems. Michiardi, P., Molva, R. (2002). Core: A Collaborative Reputation Mechanism to Enforce Node Cooperation in Mobile Ad hoc Networks. Proceedings of Communications and multimedia Security. Pang, Z., Chen, Q., Zheng, L. (2009). A Pervasive and Preventive Healthcare Solution for Medication Noncompliance and Daily Monitoring. Proceedings of Applied Sciences in Biomedical and Communication Technologies. Puterman, M. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming: Wiley-Interscience. Renaud, J.C., Tham, C.K. (2006). Coordinated Sensing Coverage in Sensor Networks using Distributed Reinforcement Learning., Proceedings of International Conference on Networks.

78 59 Resnick, P., Zeckhauser, R. (2000). Trust among strangers in Internet transactions: Empirical analysis of ebays Reputation System. Journal of The Economics of the Internet and E-commerce: Advance in applied microeconomics, vol. 11, pp Resnick, P., Kuwabara, K., Zeckhauser, R., Friedman, E. (2000). Reputation systems. The Article of Communications of the ACM, vol. 43, no.12, pp Seah, M.W.M., Tham, C.K., Srinivasan, V., Xin, A. (2007). Achieve Coverage through Distributed Reinforcement Learning in Wireless Sensor Networks. Proceedings of Intelligent Sensor, Sensor Networks and Information. Sutton, R., Barto, A. (1998). Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning): The MIT Press. Tanachaiwiwat, S., Dave, P., Bhindwale, R., Helmy, A. (2003). Location-centric Isolation of Misbehavior and Trust Routing in Energy-Constrained Sensor Networks. Proceedings of Performance, Computing and Communications. Vashney, U. (2008). Improving Wireless Health Monitoring Using Incentive-Based Router Cooperation, IEEE Computer Magazine, vol. 41, no. 5, pp Wang, P., Wang, T. (2006). Adaptive Routing for Sensor Networks using Reinforcement Learning. Proceedings of Computer and Information Technology. Watkins, C Learning from Delayed Rewards. University of Cambridge, England.

79 60 Xuedong, L., Balasingham, I., Byun, S.S. (2008). A Multi-agent Reinforcement Learning based Routing Protocol for Wireless Sensor Networks. Proceedings of Wireless Communication Systems. Xuedong, L., Balasingham, I., Byun, S.S. (2008). A Reinforcement Learning based Routing Protocol with QoS Support for Biomedical Sensor Networks. Proceedings of Applied Sciences on Biomedical and Communication Technology. Yadav, V., Mishra, M.K., Gore, M.M. (2009). Localization Scheme for Three Dimensional Wireless Sensor Networks Using GPS enabled Mobile Sensor Nodes. Journal of Next-Generation Networks, vol. 1, no. 1, pp Ying-Hong, W., Chin-Yung, Y., Wei-Ting, C., Chun-Xuan W. (2008). An Average Energy based Routing Protocol for Mobile Sink in wireless sensor networks. Proceedings of Ubi-Media Computing. Yu, H., Shen, Z., Miao, C., Leung, C., Niyato, D. (2010). A Survey of Trust and Reputation Management Systems in Wireless Communications. Journal of the IEEE, vol. 98, no. 10, pp Zhou, Y., Xing, J., Yu, Q. (2006). Overview of Power-efficient MAC and Routing Protocols for Wireless Sensor Networks. Proceedings of Mechatronic and Embedded Systems and Applications.

80 APPENDIX PUBLICATION

81 Publication Naputta, Y., and Usaha, W. (2012). RL-based Routing in Biomedical Mobile Wireless Sensor Networks using Trust and Reputation. The 9 th International Symposium on Wireless Communication Systems (ISWCS), France, August 2012.

82 63

83 64

84 65

85 66

86 67

ว ธ การต ดต ง Symantec Endpoint Protection

ว ธ การต ดต ง Symantec Endpoint Protection ว ธ การต ดต ง Symantec Endpoint Protection 1. Download File ส าหร บการต ดต ง 2. Install Symantec Endpoint Protection Manager 3. Install License 4. Install Symantec Endpoint Protection Client to Server

More information

ISI Web of Science. SciFinder Scholar. PubMed ส บค นจากฐานข อม ล

ISI Web of Science. SciFinder Scholar. PubMed ส บค นจากฐานข อม ล 2.3.3 Search Chem. Info. in Journal ส บค นจากฐานข อม ล - ฐานข อม ลท รวบรวมข อม ลของ journal จากหลาย ๆ แหล ง ISI http://portal.isiknowledge.com/portal.cgi/ SciFinder ต องต ดต งโปรแกรมพ เศษ และสม ครสมาช

More information

C Programming

C Programming 204216 -- C Programming Chapter 5 Repetition Adapted/Assembled for 204216 by Areerat Trongratsameethong Objectives Basic Loop Structures The while Statement Computing Sums and Averages Using a while Loop

More information

C Programming

C Programming 204216 -- C Programming Chapter 9 Character Strings Adapted/Assembled for 204216 by Areerat Trongratsameethong A First Book of ANSI C, Fourth Edition Objectives String Fundamentals Library Functions Input

More information

Fundamentals of Database Systems

Fundamentals of Database Systems 204222 - Fundamentals of Database Systems Chapter 24 Database Security Adapted for 204222 by Areerat Trongratsameethong Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Outline

More information

Fault-Aware Flow Control and Multi-path Routing in Wireless Sensor Networks

Fault-Aware Flow Control and Multi-path Routing in Wireless Sensor Networks Fault-Aware Flow Control and Multi-path Routing in Wireless Sensor Networks X. Zhang, X. Dong Shanghai Jiaotong University J. Wu, X. Li Temple University, University of North Carolina N. Xiong Colorado

More information

Chapter 8: Memory- Management Strategies Dr. Varin Chouvatut

Chapter 8: Memory- Management Strategies Dr. Varin Chouvatut Part I: Overview Part II: Process Management Part III : Storage Management Chapter 8: Memory- Management Strategies Dr. Varin Chouvatut, Silberschatz, Galvin and Gagne 2010 Chapter 8: Memory Management

More information

Chapter 9: Virtual-Memory Management Dr. Varin Chouvatut. Operating System Concepts 8 th Edition,

Chapter 9: Virtual-Memory Management Dr. Varin Chouvatut. Operating System Concepts 8 th Edition, Chapter 9: Virtual-Memory Management Dr. Varin Chouvatut, Silberschatz, Galvin and Gagne 2010 Chapter 9: Virtual-Memory Management Background Demand Paging Copy-on-Write Page Replacement Allocation of

More information

The New Effective Tool for Data Migration from Old PACS (Rogan) to New PACS (Fuji Synapse) with Integrated Thai Patient Names

The New Effective Tool for Data Migration from Old PACS (Rogan) to New PACS (Fuji Synapse) with Integrated Thai Patient Names The New Effective Tool for Data Migration from Old PACS (Rogan) to New PACS (Fuji Synapse) with Integrated Thai Patient Names Thanongchai Siriapisith MD*, Trongtum Tongdee MD* * Department of Radiology,

More information

Chapter 3 Outline. Relational Model Concepts. The Relational Data Model and Relational Database Constraints Database System 1

Chapter 3 Outline. Relational Model Concepts. The Relational Data Model and Relational Database Constraints Database System 1 Chapter 3 Outline 204321 - Database System 1 Chapter 3 The Relational Data Model and Relational Database Constraints The Relational Data Model and Relational Database Constraints Relational Model Constraints

More information

ADAPTIVE VIDEO STREAMING FOR BANDWIDTH VARIATION WITH OPTIMUM QUALITY

ADAPTIVE VIDEO STREAMING FOR BANDWIDTH VARIATION WITH OPTIMUM QUALITY ADAPTIVE VIDEO STREAMING FOR BANDWIDTH VARIATION WITH OPTIMUM QUALITY Joseph Michael Wijayantha Medagama (08/8015) Thesis Submitted in Partial Fulfillment of the Requirements for the Degree Master of Science

More information

Broken Characters Identification for Thai Character Recognition Systems

Broken Characters Identification for Thai Character Recognition Systems Broken Characters Identification for Thai Character Recognition Systems NUCHAREE PREMCHAISWADI*, WICHIAN PREMCHAISWADI* UBOLRAT PACHIYANUKUL**, SEINOSUKE NARITA*** *Faculty of Information Technology, Dhurakijpundit

More information

เคร องว ดระยะด วยแสงเลเซอร แบบม อถ อ ย ห อ Leica DISTO ร น D110 (Bluetooth Smart) ประเทศสว ตเซอร แลนด

เคร องว ดระยะด วยแสงเลเซอร แบบม อถ อ ย ห อ Leica DISTO ร น D110 (Bluetooth Smart) ประเทศสว ตเซอร แลนด เคร องว ดระยะด วยแสงเลเซอร แบบม อถ อ ย ห อ Leica DISTO ร น D110 (Bluetooth Smart) ประเทศสว ตเซอร แลนด 1. ค ณล กษณะ 1.1 เป นเคร องว ดระยะทางด วยแสงเลเซอร แบบม อถ อ 1.2 ความถ กต องในการว ดระยะทางไม เก น

More information

CPE 426 Computer Networks. Chapter 5: Text Chapter 23: Support Protocols

CPE 426 Computer Networks. Chapter 5: Text Chapter 23: Support Protocols CPE 426 Computer Networks Chapter 5: Text Chapter 23: Support Protocols 1 TOPICS สร ปเร อง IP Address Subnetting Chapter 23: Supporting Protocols ARP: 23.1-23.7 ใช ส าหร บหา HW Address(MAC Address) ICMP:

More information

Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended)

Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Pavlos Andreadis, February 2 nd 2018 1 Markov Decision Processes A finite Markov Decision Process

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

Looking forward to a successful coopertation : TEIN

Looking forward to a successful coopertation : TEIN Space Krenovation Park : SKP Looking forward to a successful coopertation : TEIN Geo-Informatics and Space Technology Development Agency : GISTDA Space Krenovation Park @ Chonburi 1 Mission TC/TM House

More information

The Clustering Technique for Thai Handwritten Recognition

The Clustering Technique for Thai Handwritten Recognition The Clustering Technique for Thai Handwritten Recognition Ithipan Methasate, Sutat Sae-tang Information Research and Development Division National Electronics and Computer Technology Center National Science

More information

Verified by Visa Activation Service For Cardholder Manual. November 2016

Verified by Visa Activation Service For Cardholder Manual. November 2016 Verified by Visa Activation Service For Cardholder Manual November 2016 Table of Contents Contents Registration for Card Holder verification on ACS... 3 1. Direct Activation... 4 2. Changing personal information

More information

ประว ต ว ทยากร ห วข อ 3G Technologies & 4G Roadmap ในว นพฤห สบด ท 3 ธ นวาคม 2552 เวลา น.

ประว ต ว ทยากร ห วข อ 3G Technologies & 4G Roadmap ในว นพฤห สบด ท 3 ธ นวาคม 2552 เวลา น. ประว ต ว ทยากร ห วข อ 3G Technologies & 4G Roadmap ในว นพฤห สบด ท 3 ธ นวาคม 2552 เวลา 13.00 13.45 น. ณ ห องปฏ บ ต การคอมพ วเตอร ช น 1 อาคารศ นย ก จกรรมน ส ต มหาว ทยาล ยเกษตรศาสตร ว ทยาเขตบางเขน ช อว ทยากร

More information

Throughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach

Throughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach Throughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach Charles Pandana and K. J. Ray Liu Department of Electrical and Computer Engineering University of Maryland,

More information

Integrating Dirichlet Reputation into Usage Control

Integrating Dirichlet Reputation into Usage Control Integrating Dirichlet Reputation into Usage Control Li Yang and Alma Cemerlic University of Tennessee at Chattanooga Cyber Security and Information Intelligence Research Workshop 2009 Motivation of the

More information

INPUT Input point Measuring cycle Input type Disconnection detection Input filter

INPUT Input point Measuring cycle Input type Disconnection detection Input filter 2 = TEMPERATURE CONTROLLER PAPERLESS RECORDER หน าจอเป น Touch Sceen 7-Inch LCD เก บข อม ลผ าน SD Card และ USB Memory ร บ Input เป น TC/RTD/DC Voltage/DC Current ร บ Input 6 Channel ช วงเวลาในการอ านส

More information

An SMS-Based Fault Dispatching System: An Additional Utilisation of a Mobile Phone Infrastructure

An SMS-Based Fault Dispatching System: An Additional Utilisation of a Mobile Phone Infrastructure Walailak J Sci & Tech 2004; 1(2):107-118. 107 An SMS-Based Fault Dispatching System: An Additional Utilisation of a Mobile Phone Infrastructure Wattanapong KURDTHONGMEE 1 and Pongwirat KEMAPANMANAS 2 School

More information

บทท 4 ข นตอนการทดลอง

บทท 4 ข นตอนการทดลอง บทท 4 ข นตอนการทดลอง ในบทน จะท าการทดลองในส วนของซ นเซอร ว ดอ ณหภ ม เพ อผลท ได มาใช ในการเข ยน โปรแกรมและท าโครงงานให ได ประส ทธ ภาพข น 4.1 การทดสอบระบบเซ นเซอร ว ตถ ประสงค การทดลอง ว ตถ ประสงค ของการทดลองน

More information

What s Hot & What s New from Microsoft ส มล อน นตธนะสาร Segment Marketing Manager

What s Hot & What s New from Microsoft ส มล อน นตธนะสาร Segment Marketing Manager What s Hot & What s New from Microsoft ส มล อน นตธนะสาร Segment Marketing Manager 1 โปรแกรมท น าสนใจส าหร บไตรมาสน Crisis Turing Point II Oct-Dec 09 Windows 7 งานเป ดต วสาหร บล กค าท วไป, Paragon Hall,

More information

Reinforcement Learning: A brief introduction. Mihaela van der Schaar

Reinforcement Learning: A brief introduction. Mihaela van der Schaar Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming

More information

Enhanced Web Log Based Recommendation by Personalized Retrieval

Enhanced Web Log Based Recommendation by Personalized Retrieval Enhanced Web Log Based Recommendation by Personalized Retrieval Xueping Peng FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY UNIVERSITY OF TECHNOLOGY, SYDNEY A thesis submitted for the degree of Doctor

More information

Chapter I INTRODUCTION. and potential, previous deployments and engineering issues that concern them, and the security

Chapter I INTRODUCTION. and potential, previous deployments and engineering issues that concern them, and the security Chapter I INTRODUCTION This thesis provides an introduction to wireless sensor network [47-51], their history and potential, previous deployments and engineering issues that concern them, and the security

More information

Lab 10: Structs and Enumeration

Lab 10: Structs and Enumeration Lab 10: Structs and Enumeration There is one more way to create your own value types in C#. You can use the struct keyword. A struct (short for structure) can have its own fields, methods, and constructors

More information

Introduction and Statement of the Problem

Introduction and Statement of the Problem Chapter 1 Introduction and Statement of the Problem 1.1 Introduction Unlike conventional cellular wireless mobile networks that rely on centralized infrastructure to support mobility. An Adhoc network

More information

Lecture 8 Wireless Sensor Networks: Overview

Lecture 8 Wireless Sensor Networks: Overview Lecture 8 Wireless Sensor Networks: Overview Reading: Wireless Sensor Networks, in Ad Hoc Wireless Networks: Architectures and Protocols, Chapter 12, sections 12.1-12.2. I. Akyildiz, W. Su, Y. Sankarasubramaniam

More information

Chapter 4. Introducing Oracle Database XE 11g R2. Oracle Database XE is a great starter database for:

Chapter 4. Introducing Oracle Database XE 11g R2. Oracle Database XE is a great starter database for: Oracle Database XE is a great starter database for: Chapter 4 Introducing Oracle Database XE 11g R2 Developers working on PHP, Java,.NET, XML, and Open Source applications DBAs who need a free, starter

More information

โปรแกรมท น าสนใจส าหร บไตรมาสน

โปรแกรมท น าสนใจส าหร บไตรมาสน แคมเปญ และก จกรรมทางการตลาด (ต ลาคม ธ นวาคม 2552) โปรแกรมท น าสนใจส าหร บไตรมาสน Crisis Turing Point II Oct-Dec 09 Windows 7 งานเป ดต วสาหร บล กค าท วไป, Paragon Hall, 31 Oct -1 Nov งานเป ดต วสาหร บล กค

More information

Wireless Network Security : Spring Arjun Athreya March 3, 2011 Survey: Trust Evaluation

Wireless Network Security : Spring Arjun Athreya March 3, 2011 Survey: Trust Evaluation Wireless Network Security 18-639: Spring 2011 Arjun Athreya March 3, 2011 Survey: Trust Evaluation A scenario LOBOS Management Co A CMU grad student new to Pittsburgh is looking for housing options in

More information

SEAR: SECURED ENERGY-AWARE ROUTING WITH TRUSTED PAYMENT MODEL FOR WIRELESS NETWORKS

SEAR: SECURED ENERGY-AWARE ROUTING WITH TRUSTED PAYMENT MODEL FOR WIRELESS NETWORKS SEAR: SECURED ENERGY-AWARE ROUTING WITH TRUSTED PAYMENT MODEL FOR WIRELESS NETWORKS S. P. Manikandan 1, R. Manimegalai 2 and S. Kalimuthu 3 1 Department of Computer Science and Engineering, Sri Venkateshwara

More information

An Industrial Employee Development Application Protocol Using Wireless Sensor Networks

An Industrial Employee Development Application Protocol Using Wireless Sensor Networks RESEARCH ARTICLE An Industrial Employee Development Application Protocol Using Wireless Sensor Networks 1 N.Roja Ramani, 2 A.Stenila 1,2 Asst.professor, Dept.of.Computer Application, Annai Vailankanni

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Today Topics. Artificial Intelligent??? Artificial Intelligent??? Intelligent Behaviors. Intelligent Behavior (Con t) 20/07/52

Today Topics. Artificial Intelligent??? Artificial Intelligent??? Intelligent Behaviors. Intelligent Behavior (Con t) 20/07/52 Today Topics Artificial Intelligent Applications Opas Wongtaweesap (Aj OaT) Intelligent Information Systems Development and Research Laboratory Centre Faculty of Science, Silpakorn University Webpage :

More information

Example: How to create a shape from SpecialShapeFactory.

Example: How to create a shape from SpecialShapeFactory. Example: How to create a shape from SpecialShapeFactory. // Create the Cross Shape ArrayList linelist = new ArrayList(); pointlist.add(new Point2D.Double(420.0,152.0)); pointlist.add(new

More information

Reinforcement Learning for Network Routing

Reinforcement Learning for Network Routing Reinforcement Learning for Network Routing by Hema Jyothi Yalamanchi Major advisors: Dr. Thinh Nguyen Dr. Alan Fern Committee: Dr. Bella Bose A PROJECT submitted to Oregon State University in partial fulfillment

More information

TRAINING SCHEDULE 2012

TRAINING SCHEDULE 2012 TRAINING SCHEDULE 0 Title -Quality Management Systems (QM) : ISO 00:008 Introduction and Awareness to ISO 00:008,00 8 8 7 8 7 Introduction and Awareness to ISO 00:008 (English version),00 7 Guideline for

More information

Reinforcement Learning (2)

Reinforcement Learning (2) Reinforcement Learning (2) Bruno Bouzy 1 october 2013 This document is the second part of the «Reinforcement Learning» chapter of the «Agent oriented learning» teaching unit of the Master MI computer course.

More information

แผนการสอนว ชา การเข ยนโปรแกรมคอมพ วเตอร 2 (Computer Programming 2) ภาคการศ กษา 1 ป การศ กษา 2559

แผนการสอนว ชา การเข ยนโปรแกรมคอมพ วเตอร 2 (Computer Programming 2) ภาคการศ กษา 1 ป การศ กษา 2559 แผนการสอนว ชา 01076235 การเข ยนโปรแกรมคอมพ วเตอร 2 (Computer Programming 2) ภาคการศ กษา 1 ป การศ กษา 2559 ค าอธ บายรายว ชา หล กการโปรแกรมเช งว ตถ เมธอด คลาส การซ อนสารสนเทศและการส บทอด อ ลกอร ท มพ นฐานการเร

More information

Random Neural Networks for the Adaptive Control of Packet Networks

Random Neural Networks for the Adaptive Control of Packet Networks Random Neural Networks for the Adaptive Control of Packet Networks Michael Gellman and Peixiang Liu Dept. of Electrical & Electronic Eng., Imperial College London {m.gellman,p.liu}@imperial.ac.uk Abstract.

More information

Mobile Cloud Multimedia Services Using Enhance Blind Online Scheduling Algorithm

Mobile Cloud Multimedia Services Using Enhance Blind Online Scheduling Algorithm Mobile Cloud Multimedia Services Using Enhance Blind Online Scheduling Algorithm Saiyad Sharik Kaji Prof.M.B.Chandak WCOEM, Nagpur RBCOE. Nagpur Department of Computer Science, Nagpur University, Nagpur-441111

More information

Markov Decision Processes. (Slides from Mausam)

Markov Decision Processes. (Slides from Mausam) Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology

More information

JavaScript Framework: AngularJS

JavaScript Framework: AngularJS บทท 8 JavaScript Framework: AngularJS ว ชา เทคโนโลย เว บ (รห สว ชา 04-06-204) ว ตถ ประสงค การเร ยนร เพ อให ผ เร ยนม ความร ความเข าใจเก ยวก บ JavaScript Framework: AngularJS เพ อให ผ เร ยนสามารถนาเสนอการดาเน

More information

Cooperative Reputation Index Based Selfish Node Detection and Prevention System for Mobile Ad hoc Networks

Cooperative Reputation Index Based Selfish Node Detection and Prevention System for Mobile Ad hoc Networks Research Journal of Applied Sciences, Engineering and Technology 4(3): 201-205, 2012 ISSN: 2040-7467 Maxwell Scientific Organization, 2012 Submitted: September 23, 2011 Accepted: November 02, 2011 Published:

More information

AN ad-hoc network is a group of nodes without requiring

AN ad-hoc network is a group of nodes without requiring 240 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 2, NO. 2, JUNE 2007 Securing Cooperative Ad-Hoc Networks Under Noise and Imperfect Monitoring: Strategies and Game Theoretic Analysis Wei

More information

Lecture 6 Register Transfer Methodology. Pinit Kumhom

Lecture 6 Register Transfer Methodology. Pinit Kumhom Lecture 6 Register Transfer Methodology Pinit Kumhom VLSI Laboratory Dept. of Electronic and Telecommunication Engineering (KMUTT) Faculty of Engineering King Mongkut s University of Technology Thonburi

More information

Routing protocols in WSN

Routing protocols in WSN Routing protocols in WSN 1.1 WSN Routing Scheme Data collected by sensor nodes in a WSN is typically propagated toward a base station (gateway) that links the WSN with other networks where the data can

More information

Approximate Linear Programming for Average-Cost Dynamic Programming

Approximate Linear Programming for Average-Cost Dynamic Programming Approximate Linear Programming for Average-Cost Dynamic Programming Daniela Pucci de Farias IBM Almaden Research Center 65 Harry Road, San Jose, CA 51 pucci@mitedu Benjamin Van Roy Department of Management

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Pascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering

Pascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering Study of Multiple Multiagent Reinforcement Learning Algorithms in Grid Games by Pascal De Beck-Courcelle A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of

More information

Energy Optimized Routing Algorithm in Multi-sink Wireless Sensor Networks

Energy Optimized Routing Algorithm in Multi-sink Wireless Sensor Networks Appl. Math. Inf. Sci. 8, No. 1L, 349-354 (2014) 349 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.12785/amis/081l44 Energy Optimized Routing Algorithm in Multi-sink

More information

I/O. Output. Input. Input ของจาวา จะเป น stream จะอ าน stream ใช คลาส Scanner. standard input. standard output. standard err. command line file.

I/O. Output. Input. Input ของจาวา จะเป น stream จะอ าน stream ใช คลาส Scanner. standard input. standard output. standard err. command line file. I/O and Exceptions I/O Input standard input keyboard (System.in) command line file Output standard output Screen (System.out) standard err file System.err Input ของจาวา จะเป น stream จะอ าน stream ใช คลาส

More information

Bayesian Classification Using Probabilistic Graphical Models

Bayesian Classification Using Probabilistic Graphical Models San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University

More information

SEARCH STRATEGIES KANOKWATT SHIANGJEN COMPUTER SCIENCE SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY UNIVERSITY OF PHAYAO

SEARCH STRATEGIES KANOKWATT SHIANGJEN COMPUTER SCIENCE SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY UNIVERSITY OF PHAYAO SEARCH STRATEGIES KANKWATT SHIANGJEN CMPUTER SCIENCE SCHL F INFRMATIN AND CMMUNICATIN TECHNLGY UNIVERSITY F PHAYA Search Strategies Uninformed Search Strategies (Blind Search): เป นกลย ทธ การ ค นหาเหม

More information

Trust4All: a Trustworthy Middleware Platform for Component Software

Trust4All: a Trustworthy Middleware Platform for Component Software Proceedings of the 7th WSEAS International Conference on Applied Informatics and Communications, Athens, Greece, August 24-26, 2007 124 Trust4All: a Trustworthy Middleware Platform for Component Software

More information

ElasticHosts Correspondents

ElasticHosts  Correspondents ElasticHosts Email Correspondents Date: Fri, 23 Nov 2012 16:47:34 +0700 From: Chatchai Jantaraprim

More information

Reputation Based Trust Management for Wireless Sensor Networks and Its Application to Secure Routing

Reputation Based Trust Management for Wireless Sensor Networks and Its Application to Secure Routing ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference

More information

A Frequent Max Substring Technique for. Thai Text Indexing. School of Information Technology. Todsanai Chumwatana

A Frequent Max Substring Technique for. Thai Text Indexing. School of Information Technology. Todsanai Chumwatana School of Information Technology A Frequent Max Substring Technique for Thai Text Indexing Todsanai Chumwatana This thesis is presented for the Degree of Doctor of Philosophy of Murdoch University May

More information

Energy Efficient Routing Using Sleep Scheduling and Clustering Approach for Wireless Sensor Network

Energy Efficient Routing Using Sleep Scheduling and Clustering Approach for Wireless Sensor Network Energy Efficient Routing Using Sleep Scheduling and Clustering Approach for Wireless Sensor Network G.Premalatha 1, T.K.P.Rajagopal 2 Computer Science and Engineering Department, Kathir College of Engineering

More information

A Trust-Based Geographical Routing Scheme in Sensor Networks

A Trust-Based Geographical Routing Scheme in Sensor Networks A Trust-Based Geographical Routing Scheme in Sensor Networks Ka-Shun Hung, King-Shan Lui, and Yu-Kwong Kwok Department of Electrical and Electronics Engineering The University of Hong Kong Pokfulam Road,

More information

European Network on New Sensing Technologies for Air Pollution Control and Environmental Sustainability - EuNetAir COST Action TD1105

European Network on New Sensing Technologies for Air Pollution Control and Environmental Sustainability - EuNetAir COST Action TD1105 European Network on New Sensing Technologies for Air Pollution Control and Environmental Sustainability - EuNetAir COST Action TD1105 A Holistic Approach in the Development and Deployment of WSN-based

More information

Course Curriculum for Master Degree in Network Engineering and Security

Course Curriculum for Master Degree in Network Engineering and Security Course Curriculum for Master Degree in Network Engineering and Security The Master Degree in Network Engineering and Security is awarded by the Faculty of Graduate Studies at Jordan University of Science

More information

15-780: MarkovDecisionProcesses

15-780: MarkovDecisionProcesses 15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic

More information

An Enhanced Algorithm to Find Dominating Set Nodes in Ad Hoc Wireless Networks

An Enhanced Algorithm to Find Dominating Set Nodes in Ad Hoc Wireless Networks Georgia State University ScholarWorks @ Georgia State University Computer Science Theses Department of Computer Science 12-4-2006 An Enhanced Algorithm to Find Dominating Set Nodes in Ad Hoc Wireless Networks

More information

Using Reinforcement Learning to Optimize Storage Decisions Ravi Khadiwala Cleversafe

Using Reinforcement Learning to Optimize Storage Decisions Ravi Khadiwala Cleversafe Using Reinforcement Learning to Optimize Storage Decisions Ravi Khadiwala Cleversafe Topics What is Reinforcement Learning? Exploration vs. Exploitation The Multi-armed Bandit Optimizing read locations

More information

A Security Management Scheme Using a Novel Computational Reputation Model for Wireless and Mobile Ad hoc Networks

A Security Management Scheme Using a Novel Computational Reputation Model for Wireless and Mobile Ad hoc Networks 5th ACM Workshop on Performance Evaluation of Wireless Ad Hoc, Sensor, and Ubiquitous Networks (PE-WASUN) A Security Management Scheme Using a Novel Computational Reputation Model for Wireless and Mobile

More information

A Comparative study of On-Demand Data Delivery with Tables Driven and On-Demand Protocols for Mobile Ad-Hoc Network

A Comparative study of On-Demand Data Delivery with Tables Driven and On-Demand Protocols for Mobile Ad-Hoc Network A Comparative study of On-Demand Data Delivery with Tables Driven and On-Demand Protocols for Mobile Ad-Hoc Network Humayun Bakht Research Fellow, London School of Commerce, United Kingdom humayunbakht@yahoo.co.uk

More information

Initial CITP and CSci (partial fulfilment). *Confirmation of full accreditation will be sought in 2020.

Initial CITP and CSci (partial fulfilment). *Confirmation of full accreditation will be sought in 2020. PROGRAMME SPECIFICATION Master of Computing (Hons) in Computer Forensics Awarding institution Teaching institution UCAS Code JACS Code Programme Duration Language of Programme Liverpool John Moores University

More information

Performance analysis of POMDP for tcp good put improvement in cognitive radio network

Performance analysis of POMDP for tcp good put improvement in cognitive radio network Performance analysis of POMDP for tcp good put improvement in cognitive radio network Pallavi K. Jadhav 1, Prof. Dr. S.V.Sankpal 2 1 ME (E & TC), D. Y. Patil college of Engg. & Tech. Kolhapur, Maharashtra,

More information

Crystal Report & Crystal Server 2016

Crystal Report & Crystal Server 2016 Crystal Report & Crystal Server 206 Crystal Report เป นเคร องม อในการสร าง Report ท ม จ ดเด นในความสามารถเช อมต อฐานข อม ลท หลากหลาย เพ อนำา เอาข อม ลมาใช สร างรายงานสำาหร บการใช งานท วไปในงานธ รก จ ประจำาว

More information

Intra and Inter Cluster Synchronization Scheme for Cluster Based Sensor Network

Intra and Inter Cluster Synchronization Scheme for Cluster Based Sensor Network Intra and Inter Cluster Synchronization Scheme for Cluster Based Sensor Network V. Shunmuga Sundari 1, N. Mymoon Zuviria 2 1 Student, 2 Asisstant Professor, Computer Science and Engineering, National College

More information

The Impact of Clustering on the Average Path Length in Wireless Sensor Networks

The Impact of Clustering on the Average Path Length in Wireless Sensor Networks The Impact of Clustering on the Average Path Length in Wireless Sensor Networks Azrina Abd Aziz Y. Ahmet Şekercioğlu Department of Electrical and Computer Systems Engineering, Monash University, Australia

More information

IMPROVING THE DATA COLLECTION RATE IN WIRELESS SENSOR NETWORKS BY USING THE MOBILE RELAYS

IMPROVING THE DATA COLLECTION RATE IN WIRELESS SENSOR NETWORKS BY USING THE MOBILE RELAYS IMPROVING THE DATA COLLECTION RATE IN WIRELESS SENSOR NETWORKS BY USING THE MOBILE RELAYS 1 K MADHURI, 2 J.KRISHNA, 3 C.SIVABALAJI II M.Tech CSE, AITS, Asst Professor CSE, AITS, Asst Professor CSE, NIST

More information

Resource Allocation Strategies for Multiple Job Classes

Resource Allocation Strategies for Multiple Job Classes Resource Allocation Strategies for Multiple Job Classes by Ye Hu A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer

More information

นางสาวส ภาภรณ กานต สมเก ยรต

นางสาวส ภาภรณ กานต สมเก ยรต เทคน คการว เคราะห เพ อเพ มความสามารถในการทดสอบคลาสคอมโพเนนท นางสาวส ภาภรณ กานต สมเก ยรต ว ทยาน พนธ น เป นส วนหน งของการศ กษาตามหล กส ตรปร ญญาว ศวกรรมศาสตรด ษฎ บ ณฑ ต สาขาว ชาว ศวกรรมคอมพ วเตอร ภาคว ชาว

More information

A CONFIDENCE MODEL BASED ROUTING PRACTICE FOR SECURE ADHOC NETWORKS

A CONFIDENCE MODEL BASED ROUTING PRACTICE FOR SECURE ADHOC NETWORKS A CONFIDENCE MODEL BASED ROUTING PRACTICE FOR SECURE ADHOC NETWORKS Ramya. S 1 and Prof. B. Sakthivel 2 ramyasiva.jothi@gmail.com and everrock17@gmail.com 1PG Student and 2 Professor & Head, Department

More information

Parsimonious Active Monitoring and Overlay Routing

Parsimonious Active Monitoring and Overlay Routing Séminaire STORE Parsimonious Active Monitoring and Overlay Routing Olivier Brun LAAS CNRS, Toulouse, France February 23 rd, 2017 Olivier Brun STORE Seminar, February 20th, 2017 1 / 34 Outline Introduction

More information

Figure 1. Clustering in MANET.

Figure 1. Clustering in MANET. Volume 6, Issue 12, December 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Performance

More information

PRICE LIST Video Transmission Fiber Optic Cable TEL: (May 2015) HD-AHD CCTV System

PRICE LIST Video Transmission Fiber Optic Cable TEL: (May 2015)  HD-AHD CCTV System COMMUNICATION PRODUCTS Video Transmission Fiber Optic Cable PRICE LIST 2015 HD-AHD CCTV System HD-CVI CCTV System HD-TVI CCTV System Analog CCTV System (May 2015) www.facebook.com/bismonthailand TEL: 0-2563-5000

More information

Neuro-fuzzy admission control in mobile communications systems

Neuro-fuzzy admission control in mobile communications systems University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2005 Neuro-fuzzy admission control in mobile communications systems

More information

Call Admission Control for Multimedia Cellular Networks Using Neuro-dynamic Programming

Call Admission Control for Multimedia Cellular Networks Using Neuro-dynamic Programming Call Admission Control for Multimedia Cellular Networks Using Neuro-dynamic Programming Sidi-Mohammed Senouci, André-Luc Beylot 2, and Guy Pujolle Laboratoire LIP6 Université de Paris VI 8, rue du Capitaine

More information

CONSTRUCTION AND EVALUATION OF MESHES BASED ON SHORTEST PATH TREE VS. STEINER TREE FOR MULTICAST ROUTING IN MOBILE AD HOC NETWORKS

CONSTRUCTION AND EVALUATION OF MESHES BASED ON SHORTEST PATH TREE VS. STEINER TREE FOR MULTICAST ROUTING IN MOBILE AD HOC NETWORKS CONSTRUCTION AND EVALUATION OF MESHES BASED ON SHORTEST PATH TREE VS. STEINER TREE FOR MULTICAST ROUTING IN MOBILE AD HOC NETWORKS 1 JAMES SIMS, 2 NATARAJAN MEGHANATHAN 1 Undergrad Student, Department

More information

PROGRAMME SPECIFICATION

PROGRAMME SPECIFICATION PROGRAMME SPECIFICATION Master of Computing (Hons) in Computer Security Awarding institution Teaching institution UCAS Code JACS Code Programme Duration Language of Programme Liverpool John Moores University

More information

The Price of Selfishness in Network Coding Jason R. Marden, Member, IEEE, and Michelle Effros, Fellow, IEEE

The Price of Selfishness in Network Coding Jason R. Marden, Member, IEEE, and Michelle Effros, Fellow, IEEE IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 4, APRIL 2012 2349 The Price of Selfishness in Network Coding Jason R. Marden, Member, IEEE, and Michelle Effros, Fellow, IEEE Abstract A game-theoretic

More information

A Self-Learning Repeated Game Framework for Optimizing Packet Forwarding Networks

A Self-Learning Repeated Game Framework for Optimizing Packet Forwarding Networks A Self-Learning Repeated Game Framework for Optimizing Packet Forwarding Networks Zhu Han, Charles Pandana, and K.J. Ray Liu Department of Electrical and Computer Engineering, University of Maryland, College

More information

ว.ว ทย. มข. 45(2) (2560) KKU Sci. J. 45(2) (2017) บทค ดย อ ABSTRACT

ว.ว ทย. มข. 45(2) (2560) KKU Sci. J. 45(2) (2017) บทค ดย อ ABSTRACT ว.ว ทย. มข. 45(2) 418-437 (2560) KKU Sci. J. 45(2) 418-437 (2017) การปร บปร งรห สล บฮ ลล โดยอาศ ยการเข ารห สล บเป นคาบสองช น และการแปรผ นความยาว A Modification of the Hill Cipher Based on Doubly Periodic

More information

Analysis of Cluster-Based Energy-Dynamic Routing Protocols in WSN

Analysis of Cluster-Based Energy-Dynamic Routing Protocols in WSN Analysis of Cluster-Based Energy-Dynamic Routing Protocols in WSN Mr. V. Narsing Rao 1, Dr.K.Bhargavi 2 1,2 Asst. Professor in CSE Dept., Sphoorthy Engineering College, Hyderabad Abstract- Wireless Sensor

More information

Featuring Trust and Reputation Management Systems for Constrained Hardware Devices*

Featuring Trust and Reputation Management Systems for Constrained Hardware Devices* Featuring Trust and Reputation Management Systems for Constrained Hardware Devices* Rodrigo Román, M. Carmen Fernández-Gago, Javier López University of Málaga, Spain *(Wireless Sensor Networks) Contents

More information

Security Enhancements for Mobile Ad Hoc Networks with Trust Management Using Uncertain Reasoning

Security Enhancements for Mobile Ad Hoc Networks with Trust Management Using Uncertain Reasoning Security Enhancements for Mobile Ad Hoc Networks with Trust Management Using Uncertain Reasoning Sapna B Kulkarni,B.E,MTech (PhD) Associate Prof, Dept of CSE RYM Engg.college, Bellari VTU Belgaum Shainaj.B

More information

Planning and Control: Markov Decision Processes

Planning and Control: Markov Decision Processes CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What

More information

Rab Nawaz Jadoon DCS. Assistant Professor. Department of Computer Science. COMSATS Institute of Information Technology. Mobile Communication

Rab Nawaz Jadoon DCS. Assistant Professor. Department of Computer Science. COMSATS Institute of Information Technology. Mobile Communication Rab Nawaz Jadoon DCS Assistant Professor COMSATS IIT, Abbottabad Pakistan COMSATS Institute of Information Technology Mobile Communication WSN Wireless sensor networks consist of large number of sensor

More information

MDR Based Cooperative Strategy Adaptation in Wireless Communication

MDR Based Cooperative Strategy Adaptation in Wireless Communication MDR Based Cooperative Strategy Adaptation in Wireless Communication Aswathy Mohan 1, Smitha C Thomas 2 M.G University, Mount Zion College of Engineering, Pathanamthitta, India Abstract: Cooperation among

More information

Efficient Power Management in Wireless Communication

Efficient Power Management in Wireless Communication Efficient Power Management in Wireless Communication R.Saranya 1, Mrs.J.Meena 2 M.E student,, Department of ECE, P.S.R.College of Engineering, sivakasi, Tamilnadu, India 1 Assistant professor, Department

More information

Mobile Agent Driven Time Synchronized Energy Efficient WSN

Mobile Agent Driven Time Synchronized Energy Efficient WSN Mobile Agent Driven Time Synchronized Energy Efficient WSN Sharanu 1, Padmapriya Patil 2 1 M.Tech, Department of Electronics and Communication Engineering, Poojya Doddappa Appa College of Engineering,

More information

Ameliorate Threshold Distributed Energy Efficient Clustering Algorithm for Heterogeneous Wireless Sensor Networks

Ameliorate Threshold Distributed Energy Efficient Clustering Algorithm for Heterogeneous Wireless Sensor Networks Vol. 5, No. 5, 214 Ameliorate Threshold Distributed Energy Efficient Clustering Algorithm for Heterogeneous Wireless Sensor Networks MOSTAFA BAGHOURI SAAD CHAKKOR ABDERRAHMANE HAJRAOUI Abstract Ameliorating

More information